1 



DELHI POLYTECHNIC 

LIBRARY 

CLASS NO. 311.51 

BOOK NO.JLZ®JL_ 

ACCESSION NO. 16 t 985 





WILEY MATHEMATICAL 
STATISTICS SERIES 

Walter A. Shewhart, Editor 

Mathematical Statistics 

HOEL—Introduction to Mathematical 
Statistics. 

WALD—Sequential Analysis 

Applied Statistics 

DODGE and ROMIG—Sampling 
Inspection Tables 
RICE—Control Charts 

Related Books of Interest to Statisticians 

HAUSER and LEONARD—Government 
Statistics for Business Use. 



INTRODUCTION TO 
MATHEMATICAL STATISTICS 



INTRODUCTION 

TO 


MATHEMATICAL 

STATISTICS 


By 


PAUL G. HOEL 


Associate Professor of Mathematics 
University of California at Los Angeles 


New York 

JOHN WILEY & SONS, INC. 

CHAPMAN & HALL, LTD. 
LONDON 



Copyright, 1947 

by v 

Paul G. Hoel 


A ll Rights Reserved 

This hook or any part thereof must not 
he reproduced in any form without 
the written permission of the publisher. 


FIFTH PRINTING, APRIL, 1949 


PRINTED IN THE UNITED STATES OF AMERICA 



PREFACE 


This book was designed to serve as a textbook for a two-semestel 
course in mathematical statistics for which elementary calculus is a 
prerequisite. It grew out of my experiences in teaching an introduc¬ 
tory course for junior and senior science majors. In the process of 
writing it, however, I attempted to keep in mind the needs of applied 
statisticians for a modern reference book on the fundamental methods 
of mathematical statistics. 

The material treated was selected to give the beginner a fairly broad 
introduction to both classical large-sample and modern small-sample 
methods. A number of topics have been treated very briefly because 
1 did not care to incorporate more material than experience indicated 
could be satisfactorily covered in a two-semester course. This com¬ 
promise is in harmony with the view that an introduction to such a 
rapidly developing subject as statistics should make some attempt to 
survey the more important material in the field rather than concen¬ 
trate heavily on a few topics. The references at the end of each chap¬ 
ter were designed to amplify this survey feature. 

The organization of material was determined only after trying 
various methods of teaching the subject. For example, a systematic 
approach to testing hypotheses was postponed to one of the last chap¬ 
ters, in spite of the fact that logically it should have preceded sections 
in which hypotheses are tested, because it was found that science 
students obtained a better understanding of the subject from a more 
intuitive introduction. Most of the large-sample methods were placed 
in the first part of the book for the purpose of giving students taking 
only one semester of statistics a fairly unified treatment of at least the 
classical methods. 

Since a background of elementary calculus is not sufficient for deriv¬ 
ing much of the theory, I have not hesitated to state and use without 
proof essential theorems that require at least advanced calculus for 
their derivations. For the benefit of students with more mathematical 
maturity, references to such theorems will be found at the end of the 
chapter in winch they first occur. 

In conclusion, I wish to express my appreciation to Dr. W. A. Shew r - 
hart for his many helpful suggestions. 

Paul G. Ho&l 

Los Angeles, October , 1946 
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CHAPTER I 


INTRODUCTION 

Statistical methods are essentially methods for dealing with data that 
have been obtained by a repetitive operation. For some sets of data, 
the operation that gave rise to the data is clearly of this repetitive type. 
This would be true, for example, of a set of diameters of a certain part 
in a mass-production manufacturing process or of a set of percentages 
obtained from routine chemical analyses. For other sets of data, the 
actual operation may not seem to be repetitive, but it may be possible 
to conceive of it as being so. This would be true for the ages at death 
of certain insurance-policy holders or for the total number of mistakes 
an experimental set of animals made the first time they ran a maze. 

Experience indicates that many repetitive operations behave as 
though they occurred under essentially stable circumstances. Games 
of chance, such as coin tossing or dice rolling, usually exhibit this 
property. Many experiments in the various branches of science do 
likewise. Under such circumstances, it is often possible to construct a 
satisfactory mathematical model of the repetitive operation. This 
model can then be employed to study properties of the operation and 
to draw conclusions concerning it. Such models often prove to be 
useful even though the operation is not highly stable. 

The mathematical model that a statistician selects for a repetitive 
operation is usually one that enables him to make predictions about 
the frequencies with which certain results can be expected in such an 
operation. For example, the model for studying the inheritance of 
color in the breeding of certain flowers might be one that predicted 
three times as many flowers of one color as of another color. The 
extent to which any such model will give valid predictions depends, 
of course, upon how realistic a model it is of the actual operation that 
produces the data. 

In certain types of statistical work, the data to be investigated are 
classified into a number of groups and interest is then centered on the 
number of observations in each group. When data have been so ar¬ 
ranged, they are said to form a frequency distribution. The mathe¬ 
matical model, then, often consists of a theoretical frequency distribu¬ 
tion which is thought of as corresponding to a population of possible 
observations, and the data at hand are thought of as a sample extracted 
from this population. 
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INTRODUCTION 


Because of the nature of statistical data and models, it is only natural 
that probability should be the fundamental tool in statistical theory. 
From the statistician’s point of view, it is convenient to treat proba¬ 
bility as equivalent to theoretical relative frequency. Thus, the state¬ 
ment that the probability is that A will win at a pinball machine is 
assumed to imply that, if A could play this machine indefinitely, the 
relative frequency of wins would approach The nature of this 
approach and the philosophical questions arising from this or any other 
view of probability will not be discussed here. Such questions are not 
elementary, and their consideration would detract from the explanation 
of statistical theory at this level. It will merely be assumed that the 
basic laws of probability may be applied to the frequency problems 
that will be considered. 

The science student will quickly discover the similarity between 
statistical procedure and common scientific procedure in which a 
hypothesis is set up, an experiment conducted, and the hypothesis 
tested by means of the experimental results. Beginning with the third 
chapter and continuing throughout the remainder of the book, statisti¬ 
cal hypotheses are set up and tested by means of samples. The last 
two chapters in particular are intended to assist the science student 
toward designing his experiments more efficiently by the application 
of certain statistical principles. 

The topics in the first seven chapters of this book are largely con¬ 
cerned with the theory of certain classical large-sample methods in 
statistical theory. They have been arranged according to the number 
of variables being studied. First, problems dealing with one variable 
are considered, then problems dealing with the relationship betw r een 
two variables, and finally problems dealing with more than two varia¬ 
bles. In each problem the descriptive methods of treating data are 
considered first, after which the theoretical counterpart, or mathe¬ 
matical model, is considered. With more than two variables, the 
theoretical model becomes somewhat complicated and therefore will 
not be treated here. 

The topics in the last five chapters are largely concerned with the 
theory of certain modern methods in statistics, including in particular 
some of the important small-sample methods. 

REFERENCES 

A fuller discussion of some of the preceding ideas may be found in the following 
two books: 

Wilks, S. S., Mathematical Statistics , Princeton University Press, pp. 1-4. 

Kendall, M. G., The Advanced Theory of Statistics , Griffin and Company 
pp. 104-166. 



CHAPTER II 


FREQUENCY DISTRIBUTIONS OF ONE VARIABLE 

CLASSIFICATION OF DATA 

Since the statistical methods in this book are methods for dealing 
with repetitive data, it is essential to know what properties of such 
data will prove to be useful. The particular properties that will be 
considered, and hence the types of information that need to be extracted 
from a set of data, depend upon the nature of the data and upon the 
mathematical model that is to be chosen. 

In considering the nature of the data, it is particularly important to 
distinguish between those sets of data for which the order in which the 
observations were obtained yields useful information and those sets 
for which it does not. For example, if one were interested in studying 
weather phenomena from day to day, the order might be very impor¬ 
tant. Industrial experience indicates that the information obtained 
from considering the order in which articles are manufactured is indis¬ 
pensable for efficient production. However, if one were interested in 
studying certain characteristics of college students and had selected a 
set of students by choosing every twentieth name in a college directory, 
he would hardly expect the order in which the names were obtained to 
be of any value in the study. 

Methods for dealing with data for which order is important will be 
considered in later chapters. In this chapter the emphasis will be 
upon techniques that do not use order information. The material in 
these later chapters will enable the investigator to decide whether he 
is justified in assuming that he may ignore the order information 
present in his data. 

The problem of determining what kind of information is most 
useful for any given mathematical model is not simple and will not 
be considered in this chapter. It will be assumed in this chapter 
that the model is one for which the methods about to be presented 
are appropriate. 

For data of the type being considered, and for certain types of models, 
the information that is particularly useful is often in the form of various 
kinds of averages. These averages are employed to describe the d&ta 
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FREQUENCY DISTRIBUTIONS OF ONE VARIABLE 

and to test hypotheses concerning the population from which the data 
were assumed drawn. For large amounts of data, the computation of 
these averages becomes tedious; consequently, in order to shorten the 
computational time, it is desirable to classify the data into a frequency 
table. For example, suppose one had a record of the weights of 1,000 
college men and desired their mean weight. Much time would be 
saved and very little accuracy lost if these weights were classified into, 
say, 10-pound classes and the mean weight of the classified data com¬ 
puted, all men in a given class being treated as though they possessed 
the same weight. 

If the data are for a discrete variable, there is usually no need for 
classification. Thus, data on the number of petals on flowers of a 
given species, or the number of yeast cells on a square of a hemacytom¬ 
eter, are naturally classified. There is usually little difficulty in per¬ 
forming the classification when there appears to be a need for it. 

If the data are for a continuous type of variable such as length, or 
weight, or time, they are recorded to a certain digit or decimal accuracy. 
For example, if the diameter of a steel rod is measured to the nearest 
thousandth of an inch, a diameter of 0.431 inch assumes that the “true” 
value lies between 0.4305 and 0.4315 inch. 

In classifying data for a continuous variable, experience indicates 
that for most data it is desirable to use from 10 to 20 classes. With 
less than 10 classes, too much accuracy is lost, whereas with more than 
20 classes the computations become unnecessarily tedious. In order 
to determine boundaries for the various class intervals, it is merely 
necessary to know the smallest and largest observations of the set. 
As an illustration, suppose that 200 steel rods were measured and it 
was found that the smallest and largest diameters were respectively 
0.431 and 0.503 inch. Since the range of values, which is 0.072 inch 
here, is to be divided into 10 to 20 equal intervals, the class interval 
should be chosen as some convenient number between 0.0036 and 
0.0072. A class interval of 0.005 inch will evidently be convenient. 
Since the first class interval should contain the smallest measurement 
of the set, it must begin at least as low as 0.4305. Furthermore, in 
order to avoid having measurements fall on the boundary of two adja¬ 
cent class intervals, it is convenient to choose class boundaries to 
a unit beyond the accuracy of the measurements. Thus, in this prob¬ 
lem it would be convenient to choose the first class interval as 0.4305- 
0.4355. The remaining class boundaries are then determined by merely 
adding the class interval 0.005 repeatedly until the largest measure¬ 
ment is enclosed in the final interval. If 0.4305-0.4355 is chosen as 
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the first class interval, there will be 15 class intervals and the last class 
interval will be 0.5005-0.5055. When the class boundaries have been 
determined, it is a simple matter to list each measurement of the set 
in its proper class interval by merely recording a short vertical bar 
to represent it. When the number of bars corresponding to each class 
interval has been recorded, the data are said to have been classified into 
a frequency table. It is assumed in such a classification that all 
measurements in a given class interval, say the 2 th interval, have the 
value at the midpoint of the interval. This value is called the class 
mark and is denoted by X{. Thus, x\ = 0.433 and xi& = 0.503 in the 
example just considered. The number of measurements found in the 
ith class interval is denoted by f l} while the total number of measure¬ 
ments is denoted by n. Table 1 illustrates the tabulation and resulting 
frequency table for the set of steel rods mentioned previously. 

TABLE I 


Class boundaries 

Frequencies 

Class marks: x 

Frequencies: f 

0.4305-0.4355 

11 

0.433 

2 

.4355- .4405 

M 

.438 

5 

.4405- .4455 

mn 

.443 

7 

.4455- .4505 

mi mi m 

.448 

13 

.4505- .4555 

mm m mi 

.453 

19 

.4555- .4605 

mimanimmi n 

.458 

27 

.4605- .4655 

mimimimimi mi 

.463 

29 

.4655- .4705 

mimimimimi 

.468 

25 

.4705- .4755 

mm mm hi 

.473 

23 

.4755- .4805 

mm uh 

.478 

14 

.4805- .4855 

mmm 

.483 

15 

.4855- .4905 

mnn 

.488 

9 

.4905- .4955 

mi 

.493 

6 

.4955- .5005 

nu 

.498 

4 

.5005- .5055 

n 

.503 

2 


It is a common practice for many applied statisticians to indicate 
class intervals in a slightly different form from that suggested above. 
They record not actual class interval boundaries but rather non¬ 
contiguous boundaries. Thus, they would indicate the first three class 
intervals by 0.431-0.435, 0.436-0.440, and 0.441-0.445. When interval 
boundaries are so indicated, the true boundaries are ordinarily halfway 
between the upper and lower recorded boundaries of adjacent intervals. 
Another common method of recording class intervals is to employ 
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common boundaries but to agree that an interval includes measure¬ 
ments up to but not including the upper boundary. Then the first 
three class intervals above would be indicated by 0.431-0.436, 0.436- 
0.441, and 0.441-0.446. A measurement that falls on a boundary is 
placed in the higher of the two intervals. If one knows the accuracy 
of measurement of the variable, there is little difficulty in determining 
the true class boundaries and class marks for these two methods of 
classification. It is important to use the exact class marks; otherwise 
a systematic error will be introduced in many of the computations to 
follow. 

GRAPHICAL REPRESENTATION OF FREQUENCY DISTRIBUTIONS 

A type of graph called a histogram is convenient for displaying the 
form of a frequency table. Curves are ordinarily reserved for display- 



Fia. 1. Distribution of the diameters of 200 steel rods. 


ing theoretical frequency distributions. Figure 1, which is the histo¬ 
gram for the frequency distribution of Table 1, illustrates the nature 
of this type of graph. It is to be noted that the relative frequency 
with which x occurred in a given interval is given by the ratio of the 
area of the rectangle surmounting this interval to the total area of the 
histogram. This correspondence between frequency and area is one 
of the principal advantages of the histogram for graphing purposes. It 
should also be noted that the values of x indicated in Fig. 1 are the 
class marks and are located at the midpoints of the intervals. 

Fortunately, many important frequency distributions to be found 
in nature and industry are of a relatively simple form. They usually 
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range from a bell-shaped distribution, like that in Fig. 1, to something 
resembling the right half of a bell-shaped distribution. A distribution 
of the latter type is said to be heavily skewed, skewness meaning lack 



Fig. 2. Distribution of 302,000 marriages classified according to the age of the 

bridegroom. 

of symmetry with respect to a vertical axis. It will be found, for exam¬ 
ple, that the following variables have frequency distributions that 
possess such forms in approximately increasing degrees of skewness: 



Fig. 3. Distribution of 727 deaths from scarlet fever classified according to age. 

stature, many industrial measurements, weight, age at marriage, 
mortality age for certain diseases, wealth. Figures 1, 2, and 3 repre¬ 
sent three such typical distributions with increasing degrees of skewness. 
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ARITHMETICAL REPRESENTATION OF FREQUENCY DISTRIBUTIONS 

A histogram gives very little accurate quantitative information about 
data; consequently an arithmetical description of the data is desirable. 
For simple frequency distributions, such as those whose graphs are 
given in Figs. 1, 2, and 3, this description is accomplished satisfactorily 
by measuring four characteristics of the distribution: the central tend¬ 
ency, the variation , the skewness , and the peakedness. Although there 
are various quantities in current use for measuring these four char¬ 
acteristics, experience and theory indicate that the most satisfactory 
set of such measures for data of the type being considered is given by 
means of what are known as the first four moments of the variable or 
distribution. Theoretically, higher moments than the first four could 
be used to describe the distribution still more completely, but actually 
these higher moments are so unstable in sampling problems that little 
additional reliable information is obtained from them. 


MOMENTS 

For data that have been classified, the kth moment about the origin 
is defined by 

(1) m k ' = - Y' 

» frf 

where x z is the class mark of the ith class interval, f z is the frequency 
for the ith interval, h is the number of intervals, and n is the total 
frequency. For unclassified data all the ft are equal to 1 and h = n. 


1. The First Moment as a Measure of Central Tendency 


The first moment about the origin, mi', is called the mean and is 
usually denoted by x; hence 


( 2 ) 


X = 


1 

n 


h 


2 >. 

t=l 


For unclassified data, x reduces to the familiar formula for the aver¬ 
age of a set of numbers. Formula (2) is sometimes spoken of as the 
formula for the weighted mean; however, it is merely a variation of the 
familiar form adapted to classified data. Geometrically, the mean 
represents the point on the x axis where a sheet of metal in the shape 
of the histogram would balance on a knife edge. For a histogram like 
that of Fig. 1, it is clear that x defines a measure of central tendency, 
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that is, a value about which the data tend to concentrate. The mean 
is ordinarily meant when the word average is used. For example, the 
statement that the average weight of a group of people is 140 pounds 
implies that this is their mean weight. 

If the x % and f t are not large, the value of x is easily computed from 
its definition, particularly if a calculating machine is available. Other¬ 
wise considerable time is saved for frequency tables having equal class 
intervals by using a short method based on introducing a new variable, 
u , which takes on only small integral values and which is defined by 

(3) Xi = cui + x 0 


wdiere c is the class interval and x 0 is a conveniently chosen class mark. 
The computations are somewhat easier if x 0 is chosen as a class mark 
near the mean of the distribution. When this expression is substituted 
for Xi in (2), 

1 

x = - 
n 


h 

( cm ,-+ x 0 )ft 


i 


n 



(CUifi + X 0 fi) 


1 -v ^ ^ 1 ^ ^ 

-) cuji +- y.x 0 fi 

n frf n frf 


Since c and x Q are constants with respect to these summations, they 
may be factored out and placed in front of the summation signs; hence 


1 'A 1 

£ = C ~ > U t f t + Xq - > fi 

n n ttt 


From (2) it is clear that the coefficient of c is u, while from the defini¬ 
tion of n the coefficient of xo is 1; hence 


( 4 ) 


^ == CU + X 0 


Since the computations needed to find u are relatively easy, the value 
of x can be obtained quite easily without the aid of a calculating 
machine. This short method is illustrated in Table 2. The data for 
this frequency distribution are from 1,000 telephone conversations, the 
variable x being the length of a telephone conversation in seconds, 
recorded to the nearest second. Here x 0 was chosen as 449.5 because 
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TABLE 2 


X 

/ 

u 

uf 

49.5 

6 

— 4 

-24 

149.5 

28 

—3 

-84 

249.5 

88 

-2 

-176 

349.5 

180 

-1 

-180 

449.5 

247 

0 


549.5 

260 

1 

260 

649.5 

133 

2 

266 

749.5 

42 

3 

126 

849.5 

11 

4 

44 

949.5 

5 

5 

25 

Totals 

1,000 


257 


this choice gives rise to smaller products than other choices, although 
549.5 is nearly as good. When (4) is applied to Table 2, 

/ 257 \ 

* = 100 -) + 449.5 = 475.2 

\ 1 , 000 / 

For certain common types of distributions, the mean is superior to 
other ordinary measures of central tendency, some of which will be 
considered briefly later. This superiority rests largely on the fact that 
in repeated sampling experiments from such distributions, of which the 
data at hand are thought of as the result of one such experiment, the 
mean usually tends to be more stable than these other measures of 
central tendency. For example, suppose one took a sample of 5 trees 
from a forest and calculated their mean height. Instead of the mean, 
one could have chosen, say, the middle height of the 5 as the measure 
of central tendency. Now, if one repeated this experiment a large 
number of times, he would usually find that the set of means would 
tend to be more closely clustered together than the set of middle 
measurements. This property of greater stability is particularly impor¬ 
tant in later work when a precise estimate of a population mean is 
desired. It should be clearly understood that the mean possesses 
these advantages only for certain types of distributions which are of 
particular importance and which will be considered in later chapters. 
There are other well-known distributions for which the mean is a 
very poor measure of central tendency. 

2. The Second Moment as a Measure of Variation 

The concept of variation is of paramount importance in statistics. 
Statistical methods have often been called methods for studying varia¬ 
tion. The problem of measuring variation occurs repeatedly in the 
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various sciences and in certain branches of industry. For example* 
in order to detect any lack of uniformity in the quality of a manu¬ 
factured product, it is first necessary to know the variability of the 
product. This may be illustrated by the following problem. Suppose 
a purchaser of wire will not tolerate wire that does not possess a tensile 
strength of at least 50 pounds, and that he is considering purchasing his 
wire from one or the other of two firms. If equal samples taken from 
the product of these two firms gave frequency distributions like those 
shown in Fig. 4, it is clear that the product of only one of the firms would 



Fig. 4. Hypothetical distribution of tensile strength. 


satisfy the purchaser’s requirement. Since the mean tensile strength 
was 100 pounds in each sample, the purchaser would have had no 
basis for making a decision if the variation in tensile strength had been 
ignored. 

It is customary to assume that variation means variation of the data 
about a measure of central tendency. Since the mean is being used 
as the measure of central tendency here, it is necessary to introduce 
moments about the mean in order to obtain a measure of variation 
from moments. The kth moment about the mean is defined by 

1 h 

(5) m k = - V' (a-, - x) k fi 

11 7^i 

Now it will be shown that the second moment about the mean, m 2 , 
can be used as a measure of variation. Since it is often convenient to 
have a measure of variation in the same units of measurement as for 
the data, is usually selected instead. This quantity is called the 

standard deviation and will be denoted by s; hence 
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The second moment about the mean, s 2 , which is more convenient than 
the standard deviation as a measure of variation in certain situations, 
is called the variance . Some authors define these two quantities with 
n replaced by n — 1. Their definitions have certain advantages for 
later work but seem quite unnatural here. 

If one considers the computation of 5 for two distributions of differing 
spread, like those whose histograms are given in Fig. 4, it should be 
clear that s does measure relative variation or spread. The distribu¬ 
tion with the large tails will have a relatively larger value of s because 
the large deviations, x t — x, when squared and multiplied by their 
relatively large frequencies, f l} will contribute heavily to the value of 
the sum and will more than compensate for the larger frequencies for 
small deviations in the concentrated distribution. The interpretation 
of the standard deviation as a measure of variation will be presented 
a few paragraphs later. At present it is merely a number in the same 
units as x which seems to measure the relative extent to which data 
are concentrated about the mean and which becomes larger as the 
data become more dispersed. 

The calculation of the standard deviation from its definition (6) 
becomes inaccurate unless an accurate value of x is used, and then the 
computations usually become tedious. The change of variable intro¬ 
duced for computing the mean is also useful for obtaining a short 
method of computing the standard deviation for frequency tables 
having equal class intervals. From (3) and (4) it follows that 

Xi —* x = c(u t — u) 

consequently 


-2 ( Xi - x) 2 fi = - u) 2 fi 

n n 


— — 2(u/ — 2 u{d + u 2 )f % 
n 


— c 


= c 2 


2 o 2 uj % , 2/ 3 

-2 u -b u — 

n n n 




IT 


n 


The short method for computing the standard deviation is therefore 
given by 


( 7 ) 
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Hereafter, as was done in this derivation, the indicated range of sum¬ 
mation will be omitted from the summation sign whenever the range 
is obvious. 

For data that have not been classified, fi = 1, Ui = x ty and c = 1; 
consequently (7) reduces to 



This form is often more convenient than (6) for unclassified data, 
particularly when the Xi contain at most two digits each. 

Table 3 illustrates the technique for computing s for the data of 




TABLE 3 



X 

/ 

u 

uf 

u 2 f 

49.5 

6 

-4 

-24 

96 

149.5 

28 

-3 

-84 

252 

249.5 

88 

-2 

-176 

352 

349.5 

180 

-1 

-180 

180 

449.5 

247 

0 



549.5 

260 

1 

260 

260 

649.5 

133 

2 

266 

532 

749.5 

42 

3 

126 

378 

849.5 

11 

4 

44 

176 

949.5 

5 

5 

25 

125 

Totals 

1,000 


257 

2,351 


Table 2. When (7) is applied to Table 3, 


2,351 

5 = 100 -(0.257) 2 = 151 

X 1,000 

correct to the nearest integer. 

In order to interpret the standard deviation as a measure of variation, 
it is necessary to anticipate certain results of later work. For a set of 
data that has been obtained by sampling a particular type of popular 
tion called a normal population, it will be shown that the interval 
(x — $, x + s) will usually include about 68 per cent of the observa¬ 
tions and the interval (x — 2s, x + 2s) will usually include about 
95 per cent of the observations. A sketch of a particular normal 
distribution is shown in Fig. 3, Chapter III. 

As an illustrative example of this property, consider the data for 
which the standard deviation was just computed. Previous calcula¬ 
tions gave x » 475 and s = 151, correct to the nearest integer; cons©- 
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quently the above two intervals are (324, 626) and (173, 777) respec¬ 
tively. The number of observations lying within these intervals may 
be found approximately by interpolating as though the observations 
in a given interval were dispersed uniformly throughout the interval. 
This assumption implies that on the histogram any fractional part 
of a class interval will include the same fractional part of the frequencies 
in that interval. For ease of interpolation, the histogram for this 
frequency distribution is shown in Fig. 5. If interpolation is carried 



173 324 626 777 


Fig. 5. Histogram for the distribution of 1,000 telephone conversations. 

to the nearest unit, it will be found that the interval (324, 626) will 
include 136 + 247 + 260 + 35 measurements, which is 67.8 per cent 
of them. The interval (173,777) excludes 6 + 21+9 + 11+5 
measurements, which is 5.2 per cent. For a histogram as irregular 
as this, these results are unusually close to the theoretical percentages. 
However, even for histograms possessing a considerable lack of sym¬ 
metry, the actual percentages are often surprisingly close to the theo¬ 
retical percentages, primarily because the large percentage of measure¬ 
ments in the short tail included by such an interval is compensated to a 
considerable extent by the small percentage of measurements in the 
long tail which are included. 

For certain common types of data, the standard deviation is superior 
to other common measures of variation, some of which will be con¬ 
sidered briefly later. The superiority rests partly on its greater stability 
in repeated sampling experiments and partly on its convenience for 
developing statistical theory. The situation with respect to other 
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measures of variation is very much like that of the mean with respect 
to other measures of central tendency. 

3. The Third Moment as a Measure of Skewness 

As was indicated previously, skewness implies a lack of symmetry 
with respect to a vertical axis through the mean. Increasing degrees of 
skewness are well illustrated by Figs. 1, 2, and 3. Distributions with 
a slight amount of skewness seem to be more common in statistical 
applications than any other type. Now, the third moment about the 
mean, 2(x t — xYfJn, has properties that make it useful for measuring 
the amount of skewness in a distribution. The third moment will be 
zero for a symmetrical histogram because, for each positive deviation, 
Xi — x, there will be a corresponding negative deviation with the same 
frequency, f ly so that these deviations when cubed and multiplied by 
fi w r ill cancel each other in the summation. For a histogram with a 
large right tail, in which the distribution is said to be skewed to the 
right or positively skewed, the third moment will tend to be positive 
because these large positive deviations when cubed and multiplied by 
their relatively large frequencies contribute heavily to the sum. As 
it stands, however, the third moment about the mean is not satisfactory 
as a measure of skewness because its value depends upon the units of 
measurement of x. To obtain a measure that is a pure number, the 
third moment is divided by the cube of the standard deviation. This 
ratio gives a measure that is not only independent of the scale of units 
of x but also independent of the choice of origin—a propert}^ that would 
not hold, for example, if the third moment had been divided by the 
cube of the mean. This measure of skewmess being denoted by a 3 , 

m 3 

= ~x 

m 2 ' 2 

This measure can be zero without the distribution’s being sym¬ 
metrical; however, it usually serves as a highly satisfactory measure 
of skewness. For Figs. 1, 2, and 3, the values of a 3 are approximately 
—0.5, 1.8, and 2.3, respectively. The procedure for calculating a 3 will 
be considered in the next section. 

4. The Fourth Moment as a Measure of Peakedness 

Consider the two histograms in Fig. 6. They were constructed to 
have the same means and standard deviations but to differ considerably 
with respect to the peakedness of the graph in the neighborhood of the 
mean. If two distributions have the same standard deviations but 
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one of them has a considerably larger percentage of its data concen¬ 
trated about its mean as in Fig. 6 , then that one will usually have 
considerably longer tails to compensate for the larger percentage of 
small squared deviations contributing to the total sum of squared 
deviations in the definition of the standard deviation. Because of 
the heavy contribution of the large deviations to the sum of the fourth 
powers of the deviations, the peaked distribution with long tails will 
therefore tend to have a relatively larger value of the fourth moment 
about the mean. It does not follow from these considerations that 
the more peaked of two such distributions necessarily has the larger 



fourth moment. Examples can be constructed for which this is not 
true; nevertheless, it is convenient to treat the fourth moment as a 
measure of peakedness. In order to obtain a measure of peakedness 
that is independent of the units of measurement, the fourth moment 
is divided by the fourth power of the standard deviation. If a 4 
denotes this measure of peakedness, 


m 4 



It is customary in statistical literature to speak of this quantity as 
a measure of kurtosis. There seems to be little point, however, in 
perpetuating a Greek word to describe this property of a distribution, 
particularly since the property is not precisely defined. 

It is customary to compare the peakedness of distributions with 
that of the previously mentioned normal distribution, for which this 
measure turns out to be 3. For Figs. 1 , 2 , and 3, the values of a 4 are 
approximately 2.85, 7.15, and 8.37, respectively. Although a com¬ 
parison of the values of a 3 and <4 for these three illustrations might 
indicate a relationship between these two measures, they are, never¬ 
theless, independent measures. 
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Moments about the mean are tedious to compute by means of the 
formulas defining them. The change of variable for shortening the 
calculation of both the mean and the standard deviation for equal 
intervals can be used to advantage here also. To obtain the short 
method for calculating the kth moment about the mean, it is merely 
necessary to substitute Xi — x = c(u t — u) in (5) and apply the 
binomial theorem. Thus, 


m k = - Hc k (ui — u) k fi 
n 


c k 

= -2 
n 


*<*-!> 


^uj c — kui k 1 u -f— -- v,i 2 u 2 — • 


•+( 


-!)*«*)/< 


( 8 ) 




- ku - 


n 


n 


+ 


k(k - 1 ) ~Zu k ~ 2 fi 


- u 


-+ (- 1 )*#} 


The sums occurring in this formula are relatively easy to compute with 
a table of powers and a calculating machine. The technique for com¬ 





TABLE 4 




X 

/ 

u 

uf 

u 2 f 

v?f 

u A f 

49.5 

6 

-4 

-24 

96 

—384 

1,536 

149.5 

28 

-3 

-84 

252 

—756 

2,268 

249.5 

88 

-2 

-176 

352 

-704 

1,408 

349.5 

180 

-1 

-180 

180 

-180 

180 

449.5 

247 

0 





549.5 

260 

1 

260 

260 

260 

260 

649.5 

133 

2 

266 

532 

1,064 

2,128 

749.5 

42 

3 

126 

378 

1,134 

3,402 

849.5 

11 

4 

44 

176 

704 

2,816 

949.5 

5 

5 

25 

125 

625 

3,125 

Totals 

1,000 


257 

2,351 

1,703 

17,123 


puting as and a 4 will be illustrated on the data of Table 2. The com¬ 
putations are shown in Table 4. Here 


0.257, 

n n 

2 Mi 3 /,- 2 Ui 4 fi 

—— = 1.763, -- 

n n 


2.351 

17.123 
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Therefore, 


m 3 = 100 3 { 1.763 - 3(0.257) (2.351) + 3(0.257) 2 (0.257) - (0.257) 3 } 

= 100 3 { -0.016} 

mi = 100 4 {17.123 - 4(0.257)(1.763) 

+ 6(0.257) 2 (2.351) - 4(0.257) 3 (0.257) + (0.257) 4 } 


= 100 4 { 16.23} 
-0.016 




3.45 

16.23 


= -0.005 


= 


5.22 


- 3.11 


As was to be expected from comparing Fig. 5 and Fig. 3, Chapter III, 
these values are close to the theoretical normal distribution values of 
0 and 3 respectively. 

Formula (8) is convenient for computing the moments about the 
mean in terms of moments about the origin. It is merely necessary 
to choose c = 1 and u x = x l9 and then (8) reduces to 

k(k — 1) 

(9) m k = m k — km k _i?ni' H--b ( — l) k mx k . 


OTHER DESCRIPTIVE MEASURES 

Among the more common other measures of central tendency are 
the median, mode, and geometric mean. 

For a set of measurements arranged in order of magnitude, the 
median is defined as the middle measurement, if there is one, other¬ 
wise as the interpolated middle value. Thus, for the set of measure¬ 
ments, 2, 3, 3, 4, 5, 5, 6, 6, 7, 7, 7, 9, the median is 5.5. For classified 
data the median is defined as the abscissa which divides the area of 
the histogram into two equal parts. Some workers prefer the median 
to the mean when the distribution is heavily skewed because they feel 
that it is more representative of what a measure of central tendency 
should be than the mean is under such circumstances. They might, 
for example, prefer the median when discussing the notion of average 
wage of a community because a few very large incomes would produce 
a mean wage higher than the notion of average wage implies, whereas 
the median wage would not be so affected. 

The mode of a set of measurements is defined as the measurement 
with the maximum frequency, if there is one. For the set of measure- 
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ments in the preceding paragraph, the mode is 7. If there is more 
than one measurement with the maximum frequency, no completely 
satisfactory definition exists. The mode is used occasionally in situa¬ 
tions similar to those for which the median might be selected. Since 
the mode is of questionable value in descriptive statistics, it will not 
be considered further here. 

The geometric mean of a set of measurements is defined as 
Vx/W 2 - • •*/*. If the data are classified, x t represents the ith 
class mark; otherwise it represents the zth measurement, in which 
event all the /»■ equal 1. It will be observed that the logarithm of 
the geometric mean is equal to the arithmetic mean of the logarithms. 
This measure is used principally in working with business index num¬ 
bers, for which it possesses certain advantages. 

Among the more common measures of variation are the range and 
mean deviation. 

The range, which is the difference between the largest and smallest 
measurement in the set, is used as a measure of variation largely 
because of its ease of computation. It is often applied in certain 
industrial engineering work. It has two important disadvantages. 
First, its value usually increases with n because there is a better 
chance of obtaining extreme measurements if a large sample of data 
is taken than if a small sample is taken. It is possible, however, to 
make allowance for this growth and thus eliminate this disadvantage 
of the range. Secondly, the range is usually quite unstable in repeated 
sampling experiments of the same size when n is large; consequently, 
its use is ordinarily restricted to sets of data containing less than 
10 observations each. Because of its importance in various fields, the 
range will be studied more fully in a later chapter. 

The mean deviation is defined as X | x% — x | / z /n, where the absolute 
values, that is, the positive values of deviations are employed. This 
measure of variation is often used because it appears to be easier to 
calculate and understand than the standard deviation. It will be 
found, however, that the short method of calculating the standard 
deviation is about as fast as calculating the mean deviation, when 
n is large. 

There are additional measures of skewness and peakedness in current 
use, but they will not be considered here. Consideration was given to 
these other measures of central tendency and variation only because 
they appear quite often in certain fields of application and a student of 
statistical methods should be acquainted with them. However, for the 
present, moments will be selected as the preferred set of descriptive 
measures unless there are valid reasons for doing otherwise. 
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An interesting example of a mathematical model for which moments are a poor 
choice of descriptive measures is the theoretical distribution given by 


*[ 1 + (x — m) 2 ] 

For this distribution, which is called the Cauchy distribution, it turns out that the 
mean of a sample of n observations is no better than a single observation for esti¬ 
mating the value of m; the median is a far better measure of central tendency. 


EXERCISES 

1. Weights of 300 entering freshmen ranged from 98 to 226 pounds, correct to 
the nearest pound. Determine class boundaries and class marks for the first and 
last class intervals. 

2 . The thickness of 400 washers ranged from 0 421 to 0.563 inch. Determine 
class boundaries and class marks for the first and last class intervals. 

3. If the weights in problem 1 had been recorded to the nearest quarter of a 
pound, what change if any would you make in your classification? 

4. Given the following frequency table of the heights in centimeters of 1,000 
students, draw its histogram, indicating the class marks. 


X 

155-157 

158-160 

161-163 

etc. 






.. . . | 






/ 

4 

8 

26 

53 

89 

146 

188 

181 

I 

125 

i 

92 

60 

22 

4 

1 

1 

1 


5. Given the following frequency table of the diameters in feet of 56 shrubs 
from a common species: (a) draw its histogram; ( b ) guess by merely inspecting the 
histogram, the values of x, s, as, and a*. 


X 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

U 

12 

f 

1 

7 

11 

16 

8 

4 

5 

2 

1 

0 

0 | 

1 
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8 . Given the following frequency table on the number of petals per flower for 
ranunculus: (a) draw its histogram; ( 6 ) guess, by merely inspecting the histogram, 
the values of £, s, aa, and 04 . 


X 

5 

6 

7 

8 

9 

10 

f 

133 

55 

23 

7 

2 

2 


7 . Give one illustration each of types of data for which you would expect the 
frequency distribution to be (a) fairly symmetrical, ( b ) slightly skewed, (c) heavily 
skewed, ( d ) J shaped. 

8 . For the data of problem 5 calculate £ by (a) definition, ( b ) the short method. 

9 . For the data of problem 5 find (a) the crude median and mode, (b) the me¬ 
dian by interpolation. 

10. For the data of problem 5, calculate 8 by (a) definition, ( b ) the short method. 

11. For the data of problem 5 find the range and mean deviation. 

12 . For the data of problem 5 calculate 03 and 04 by the short method. 

h 

13 . Show that (x l — x)fi — 0 . 

1 __ 

14 . Show that X — 1 —- 2 if -f AT 2 *= N and X\ and X% are two 

N 

means of different data which are to be combined into one set. 

16 . Use the formula of problem 14 to find X for the following data on tubercu¬ 
losis deaths by ages in which the intervals are not all equal. Use the short method 
for computing Xi and AY 


X 

0-4 

5-9 

10-14 

etc. 



i 

35-44 

45-54 

etc. 



f 

1,356 

537 

1,278 

6,300 

10,911 

10,349 

8,776 

15,456 

11,060 

7,455 

4,788 

1,866 

, 


16 . For the histogram of problem 5, (a) find what percentage of the data lies 
within the intervals i±s and £ db 2 s; (b) compare these percentages with normal 
curve values, indicating whether the agreement is about what might be expected 
here. 

17 . Show that s 2 = -~ Sl - ~ + ~~~ (Xi — A 2) 2 for a situation like that 

N Ar 

of problem 14. 

18 . Use the formula of problem 17 on the data of problem 15 to find « 2 , using 
tho short method to find si 2 and S 2 2 * 

19 . Would the standard deviation of tree diameters selected at random from a 
given forest increase, decrease, or remain about the same as you took larger and 
larger samples? 



CHAPTER III 


THEORETICAL FREQUENCY DISTRIBUTIONS OF 
ONE VARIABLE 

CONTINUOUS FREQUENCY DISTRIBUTIONS 

1. General Distribution Functions 

In order to obtain statistical methods that will be sufficiently versa¬ 
tile to handle a wide variety of practical problems, it is necessary to 
work mathematically with theoretical frequency distributions which 
represent actual distributions satisfactorily. For a continuous variable 
this implies working with curves rather than with histograms. 

In certain types of statistical work, a set of data is thought of as a 
sample that has been taken from some theoretical frequency distribu¬ 
tion, or population. Thus, data giving the diameters of 100 trees may 
be thought of as having been obtained by selecting 100 trees at random 
from some forest. By random sampling from a practical point of view 
is ordinarily meant a mechanical method of sampling like that for 
games of chance. For example, if each tree of a forest were assigned 
a number on a slip of paper and these slips of paper were thoroughly 
mixed in a container, then drawing from this container would be con¬ 
sidered to give random sampling of the forest. However, the following 
game-of-chance procedure would not be considered random sampling 
of the forest. Suppose that the forest is in the form of a square and 
that it is thought of as having been divided into a large number of 
equal squares by means of equally spaced lines parallel to the sides. 
Then a square may be selected by numbering the squares and drawing 
a number by means of slips of paper as before. After a square has been 
selected, a tree may be selected from the square in a similar manner. 
For this sampling procedure it might happen that one of the small 
squares had a single tree in it while another square had 10 trees; conse¬ 
quently the single tree would be sampled 10 times as often on the 
average as any one of the 10 trees. Although the sampling in both 
procedures is based upon games of chance, the second does not allow the 
game of chance to operate upon the individual of the population. If 
each square contained the same number of trees, however, the second 
procedure would be considered random sampling. 

22 
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It is to be noted that random sampling from this point of view implies 
a finite population, although the population may be thought of as 
extremely large. In sampling from a finite population, it will be as¬ 
sumed that the individual selected is returned to the population so 
that the population remains of constant size. 

In order to apply calculus methods to populations, it is convenient 
to conceive of them as infinite in size and to represent their frequency 
distributions by curves. A mathematical definition of random sampling 
for such idealized infinite populations will be considered later. One 
need not be concerned about the lack of reality in the assumption of 
an infinite population and the method of sampling randomly from it 



Fig. 1. Hypothetical observed and theoretical frequency distributions. 


since such artificial devices are merely part of a mathematical theory, 
or model, and in practice there is not much difficulty in applying the 
resulting theory to finite reality. 

Properties. A theoretical distribution function of a continuous vari¬ 
able x will be denoted by f(x). It is defined as that function for which 

( 1 ) J* f(x) dx = P[a < x < /?] 

where a < & are any two values of x and the expression on the right 
indicates the probability, that is, the theoretical relative frequency, 
with which x will fall between a and /3 in random sampling. 

For the purpose of explaining the reasoning behind this definition,* 
consider the histogram and curve of Fig. 1, in which the total area 
under each is equal to 1. The curve is to be thought of as the graph of 
f(x), and the histogram is to be thought of as the graph of the observed 
f requency distribution of a random sample of size n extracted from the 
population represented by f(x). Now, the area of the shaded rectangle 
of the histogram must be/ t /n, since the ordinates are proportional to 
the observed frequencies and 2/*/n = 1. Thus, the area under the 
histogram above the ith interval is equal to the relative frequency 
with which x occurred in the ith class interval. The analogous geo- 
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metrical property for the curve would require that the area under the 
curve above the ith interval be equal to the theoretical relative fre¬ 
quency with which x will occur in that interval. Since this property 
should not depend upon the classification of data, it should hold for 
any interval (a, fi) as in (1). 

It follows from (1) that the total area under the graph of f(x ) which, 
lies above the x axis must equal 1 and that/(x) ^ 0 if it is a continuous 
function; consequently many functions could not serve as mathematical 
models for observed distributions. 

Moments. Moments for theoretical distributions may be defined 
by considering the limit of the sum which defines moments of observed 
distributions. Consider the problem geometrically by means of Fig. 1. 
Since the area of the shaded rectangle of the histogram is fi/n, it follows 
from (1), Chapter II, that 


m k ' 


h 



Si 

n 


xfyi Ax 




where yi denotes the ordinate of the ith rectangle and Ax the class 
interval. For the curve corresponding to this histogram, the natural 
procedure would be to define the &th moment as the limit of this sum 
with y t replaced by /(x t ) as Ax approached zero. Hence, the theoretical 
kth moment about the origin, which will be denoted by ma/, is defined by 


( 2 ) 



dx 


where (a, 6) is the interval over which/(x) is defined. The correspond¬ 
ing fcth moment about the mean, by analogy with (5), Chapter II, is 
defined by 


( 3 ) 



mi')*/(*) dx 


Throughout this book corresponding Greek and Roman letters will 
be used to represent corresponding theoretical and data quantities. 
Thus, vk represents the theoretical or population kth moment for 
which nik is the corresponding observed or sample value. Then ra*/ 
is thought of as an approximation to ma/ based upon a random sample 
of size n. There will be two exceptions to this rule because of tradition. 
The first is that x and m will be used to represent a sample and popula¬ 
tion mean respectively in place of m\ and \x\ \ The second is that p f 
and p will represent a sample and population percentage respectively 
in place of p and t. A third change in notation because of tradition, 
but one which does not contradict the above rule, is that s and <r will 



GENERAL DISTRIBUTION FUNCTIONS 


25 


represent a sample and population standard deviation respectively in 
place of V% and \/m 2 - With these changes in notation, the four 
descriptive quantities, based upon moments, for data and theory are 
a 4 and m, <r, a 3 , a 4 , respectively. 

As an illustration of how to find these descriptive quantities for a 
theoretical distribution, consider the function 

fix) = ce x 

defined only for non-negative values of x. This function can serve as 
a distribution function provided that c is chosen properly. Since the 
area under the graph of f(x) must equal 1, 

/* oo 

1 = 1 ce~ x dx = c 

Jo 


and hence f(x) — e~~ x . To find the four descriptive quantities for this 
distribution, it is convenient to find the /cth moment about the origin. 
If (2) is applied, 


Mfc = 



dx 


The value of this definite integral is found in any standard table of 
integrals, or it may be evaluated by repeated integration by parts. 
This integral is the integral defining the gamma , or factorial, function. 
More precisely, 

/loo 

T(k + 1) = I x k e~ x dx 

Jo 

When k is a positive integer, r(& + 1) = k\. Since k is a positive 
integer in this problem, nk = k\; consequently mi / = 1 , V 2 = 2, 
Hz' = 6, and /x 4 ' = 24. Then, since formula (9), Chapter II, for find- 



Fig. 2 . The distribution function f(x) ** e* - *, x ^ 0 . 


ing the moments about the mean in terms of moments about the origin 
holds for theoretical moments also, m = 1, = 1, a*3 = 2, and /z 4 = 9; 

consequently m = 1, <r = 1, c* 3 = 2, and a 4 = 9. The graph of this 
distribution is shown in Fig. 2. Although this function was selected 
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merely for illustration of methods, it might conceivably prove useful 
as a mathematical model for histograms similar to the one in Fig. 3, 
Chapter II. 

Moment-generating function. Although the direct computation of 
a theoretical moment from its integral definition may be relatively easy 
with the aid of a table of integrals, it is convenient for later theory to 
be able to compute moments indirectly by another method. This 
method will be introduced here and used throughout several chapters 
for deriving formulas. It involves finding what is known as the moment ?- 
generating function. As the name implies, the moment-generating 
function is a function that generates moments. It is defined by 


(4) 


M x 



b 

e 6x f(x) dx 


This integral is a function of the parameter 6 only, but the subscript 
x is placed on M{6) to show what variable is being considered. The 
parameter 0 has no real meaning here; it is merely a mathematical tool 
for aiding in the determination of moments. To see how M x (0) does 
generate moments, assume that f(x) is a distribution function for 
which this integral exists. Then e 0x may be expanded in a power series 
and the integrations may be performed term by term. Since the power 
series for e z is 


z 2 z 3 

1 + Z + - + --+- 
2! 3! 


it follows that 


M x (e ) = f [l + ex + 


e 2 x 2 



d 3 x 3 

— H-J/O) dx 





dx + 6 




dx -{-••• 


q2 q3 

(5) = uo + hi 0 + fi 2 f — + vs — H- 


It will be observed that the coefficient of 0 k /k ! in this expansion is the 
/cth moment about the origin; consequently, if the moment-generating 
function can be found for a variable x and can be expanded into a 
power series in 6, the moments of the variable are readily obtained by 
merely inspecting the expansion. If a particular moment is desired, 
it may be more convenient to evaluate it by computing the proper 
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derivative of MM at 6 = 0, since repeated differentiation of (5) shows 
that 


( 6 ) 




d k M 

~dd k 



There are distribution functions that do not possess moments of all 
orders; consequently this method cannot be applied to them. For 
example, if f(x) is defined for x ^ 0 by 


/O) = 


c 

1 + X* 


it will be found by direct integration that no moment higher than the 
second exists. 

As an illustration of the moment-generating-function technique for 
finding moments, consider the function of the preceding section that 
illustrated the direct computation of moments from their integral defini¬ 
tion. Here f(x) — e~ x . Then, by (4), 


MM = f e 6x -e- x dx 



e x(0 ~ l) 


dx 


e X(0~l) oo 

»-lo 


Since 6 is a parameter that can be chosen as small as desired, this 
moment-generating function wall exist provided that 0 < 1 . Then 


e 




6 - 1 


-*( 1 - 0 ) 


6 - 1 


1 - 0 


For | 6 | < 1 , 


mm = YZ~e = 1 + 6 + ° 2 + ° 3 + ‘ 


q2 

- 1 + 0 + 2! -b 3!-1 - 

2 ! 3! 


Since the coefficient of 6 k /k\ is 7c!, it follows that nk = k\ f which is the 
result obtained by direct integration in the preceding section. 

Although the ftth moment, fiu , and the moment-generating function, 
M x (0), were defined for the variable x only, the definitions can be 
generalized to hold for the variable g(x) y where < 7 +) is any function 
of x . For example, if g(x) = x — m, the kth moment of g{x) would 
be the kth moment of x about its mean, and the moment-generating 
function of g(x) would yield moments about the mean for x. Thus, 
general definitions in terms of g(x) enable one to shift easily from 
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moments about the origin to moments about the mean. In addition, 
such general definitions enable one to consider various other useful 
changes of variable. These general definitions are the following: 


<7) 

Mfc:/ = f 9 k (x)f(x) dx 

Jo 

(8) 

r b 

MM = I e s(x) f(x) dx 

da 


When g(x) = x, these definitions reduce to (2) and (4). When g(x) 
= x — m, definition (7) reduces to (3). 

Two useful properties of the moment-generating function are easily 
obtained. Let c be any constant, and let G{x) be a function of x for 
which the moment-generating function exists. Then, since g(x) in (8) 
represents an arbitrary function, g(x) may be chosen as g(x) = cG(x ); 
consequently 

r b 

(9) M cG {6) = I e m J{x) dx = M g (c8) 

J a 

The other property is obtained by choosing g{x) = G(x) + c. Then 

(10) M a+ c(d) = f e e[ ° (x)+c] f(x) dx = C 6c f e 90ix) f(x) dx = e cB M G {0) 

da J a 


These two properties enable one to dispose of a bothersome constant, 
c, which is a factor of, or is added to, a function, G{x). Applications 
of these two properties will be made in later sections. 


2. Normal Distribution 

A normal-distribution function is defined as a function of the form 

1/ x—a \ 2 

( 11 ) f(x) = ^ 2V ' 6 } 

where a, fc, and c are parameters so restricted that f{x) has the essential 
properties of a distribution function. For example , c must be such that 
the area under the normal curve is equal to 1. Normal distributions 
are very useful as mathematical models for many frequency distribu¬ 
tions found in nature and industry. Thus, a great many measurements 
made on manufactured articles possess distributions that can be 
approximated well by normal distributions. As was indicated in the 
preceding chapter, the same is also true of many biological measure¬ 
ments. Figures 1 and 5, Chapter II, illustrate histograms which it 
will be found could be approximated well by means of normal curves. 
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Even if there were few natural distributions of this form, the normal 
distribution would still be extremely important because of its place 
in theoretical work. Thus, it will be shown later that under mild 
restrictions a sample mean approximately follows a normal distribution 
even though the basic variable does not. 

Properties. The characteristic properties of a normal distribution 
may be obtained by studying the four descriptive quantities defined 
in the preceding chapter. Since these quantities are defined in terms of 
moments, consider the moment-generating function of a normal varia¬ 
ble. It is convenient here to work with the variable x — a rather than 
with x . 

If definition (8) with g{x) = x — a is applied to (11), 


= f e 9(x - n) f(x) dx 

J — oo 


€ 


9{x 




dx 


Let z = (x — a)/b; then dx = bdz and 


M x 


/loo 

-aW = be I 

•J — oo 


Obz —- 


dz 


Complete the square in the exponent as follows: 

z 2 1 1 

Obz -= — (z-Ob) 2 + -0 2 b 2 

2 2 2 

Then, 

M x ^(d) = bce^T e-^-^dz 
J — ® 

If t = z — Obj then dz — dt and 

M m («) = bce' Ae%t f* e~ 2 dt 

The value of this integral can be found in any standard table of inte¬ 
grals. Or it may be evaluated directly by the following device. Let 

/!» 

/ — I e 2 dt 

Jo 


I 2 



ir 

2 


dy 


dx dy 


Then 
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In polar coordinates, 


“f f vi 

Jq •'0 


-r 

•'O 


r dr dO 


2 r* 


dd 


Hence, 

and 

( 12 ) 


n it 

= de = - 

Jo 2 


I 


e 2 dt = V2^ 


M x _a(8) = V&rbce™ 1 ’’* 


If this exponential is expanded in a power series, 

2 q4 “1 

l + fc 2 - + 6 4 - + ---J 

Since the constant term is the moment of zero order, which is merely 
the area under the curve, and since this area is always equal to 1 for 
a distribution function, it follows that \f2ibc = 1 and that 


M x „ a {e) 


9 6 

1 + b 2 — + b 4 - + ■ 
2 8 


The coefficient of 0 is zero; consequently the mean of the variable 
x — a must be zero, and hence the mean of x must be a. The mean of a 
variable x is usually denoted by ra, so that a = m. Since x — a = x 
— m and the above generating function gives moments of x — a, the 
above expansion must give moments of x about its mean. It therefore 
follows that all odd moments of a normally distributed variable about 
its mean are zero. This was to be expected because from (11) it is 
clear that the normal curve is symmetrical with respect to the line 
x = a. Now the coefficient of 0 2 /2 is the second moment, while that 
of 0 4 /24 is the fourth moment; hence 

<r 2 = /x 2 = b 2 
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Since, from above, a = m, b = a, and \Z2irbc = 1 , the parameters 
a, b, and c may be solved for in terms of the familiar statistical parame¬ 
ters m and a and inserted in (11) to give 


(13) 


1 


/ 


Thus, a normal distribution is completely determined by specifying 
its mean and standard deviation. It should be noted that the only 



difference between (11) and (13) is that the parameters in (11) have 
now been reduced to two independent parameters which have been 
given statistical meaning. It should also be noted that all normal 
distributions possess the same amount of peakedness as measured by 
moments, namely that expressed by a 4 =3. 

The graph of a normal distribution function is shown in Fig. 3. 

For the purpose of interpreting the standard deviation geometrically, 
consider the points of inflection of a normal curve. When (13) is 
differentiated twice, 

r = - 4 (* - m )f 

o- 



From the first derivative it is clear th at there is but one maximum 
point, which occ urs at x = m. From the second de ri vativ e it follows 
t hat points oTlnfleel ln^ dh a. Geometrically, then, 

the sta ndard deviation is the distance from the axis of symm etry to a 
pomt of inflection. 
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In the preceding chapter, meaning was given to the standard devi¬ 
ation as a measure of variation by stating that, for histograms ap¬ 
proximating a normal curve, the interval x dt $ included about 68% of 
the data while x±2 s included about 95% of the data. This property 

will iIBW be'verified/ —-* 

From (1), the relative frequency with which x will fall in the interval 
m db a is given by 



When t — (x — m)/<r , then dx = a dt and 



The value of the last integral and the factor l/Vshr may be found in 
Table II in the back to be 0.3413. Hence the value of the desired 
integral is 0.68, correct to two digits. For the limits m dt 2<r, one may 
verify that t — ±2 and that the area between is 0.95. The unit o f 
measurement give n by t = (x — m)/(r is called a standard unit . 
Table if is thereiore a EaEle ol tne normal distribution in standard 
units, that is, ol a hormal distribution with zero mean and unit standard 
deviation . " " “ — 

Fitting to histograms. Consider the problem of fitting a normal 
curve to a histogram. If one has reasons for believing that a set of 
data represents a random sample from some normal population, then 
the fitted normal curve would serve as an approximation to the popula¬ 
tion curve. Since a normal distribution is completely determined by 
its mean and standard deviation and these quantities can be rather 
accurately estimated for n fairly large, one would have considerably 
more confidence in the fitted normal curve as representing the popula¬ 
tion distribution than in the histogram of the data as doing so. There 
is not much occasion to fit normal curves to histograms. Frequency 
curve fitting is important in some statistical fields; however, for most 
statistical purposes it is more of an exercise to acquaint the student 
with the normal curve and with the extent to which normal data are 
found in statistical practice. 

As an illustration of ihe technique of fitting a normal curve to a 
histogram, consider once more the data of Table 2, Chapter II, for 
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which the four descriptive quantities were calculated previously and 
whose histogram is shown in Fig. 5, Chapter II. Here x — 475, 
s = 151, a 3 = —0.005, and a 4 = 3.11. It appears that the values of 
as and a 4 are close to the theoretical values of a 3 = 0 and oq =* 3 for 
a normal distribution; consequently a normal curve might be expected 
to fit fairly well. Now choose m = x and <r = s. Then by (13) the 
resulting normal distribution is 


(14) 


^ 151 


1 1 / s —475 \ a 

1 °V 151 ) 


The graph of this function, of course, has unit area and hence must be 
multiplied by the total area of the histogram if it is to fit the histogram. 
However, except for the purpose of seeing how well the curve fits the 
histogram, it is not necessary to calculate ordinates, since the agree¬ 
ment between the fitted curve and the histogram will be determined 
by comparing the corresponding areas under the curve and the histo¬ 
gram for the various class intervals. In the fitting technique it is 
therefore convenient to work with percentage areas under the normal 
curve. These percentage areas for the various class intervals of the 
histogram are calculated systematically by starting with the first class 
interval. Now to any value of x for the curve (14) there corresponds 
a value of t = (x — 475)/151 for the standard normal curve 


(15) 



t* 

2 


such that the percentage of area to the left of x in (14) is the 
same as the percentage of area to the left of t in (15). For, since 
t = (x — 475)/151, then dx = 151 dt and 



151V27r 


1 /a—475\ s 
~2 \ 151 / 


dx — 



1 

—-~=z e 2 dt 

v 27r 


The value of this integral can be obtained from Table II. The proce¬ 
dure for finding these normal curve frequencies is illustrated in Table 1. 

The agreement seems to be excellent except for the rather large 
difference between 230.3 and 260. The extent of such discrepancies is 
more readily realized by comparing the graphs of the histogram and the 
fitted normal curve as shown in Fig. 4. The problem of whether or 
not the fit may be considered satisfactory will be considered in a later 
chapter. 

Applications. The interesting and important applications of normal 
distributions will be considered in later chapters after further essential 
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TABLE 1 


Class 

x - 475 

Area to 

Area for 
interval to 

Theoretical 


boundaries 

151 

left of t 

left of t 

frequency 

Observed 

X 

t 

A 

A A 

n A A 

frequency 

99.5 

—2.49 

0.0064 

0.0064 

64 

6 

199.5 

-1.82 

.0344 

.0280 

28.0 

28 

299 5 

-1.16 

.1230 

.0886 

88.6 

88 

399.5 

-0.50 

.3085 

.1855 

185.5 

180 

499.5 

0.16 

.5636 

.2551 

255.1 

247 

599.5 

0.82 

.7939 

.2303 

230 3 

260 

699.5 

1.49 

.9319 

.1380 

138.0 

133 

799.5 

2.15 

.9842 

.0523 

52.3 

52 

899.5 

2.81 

.9975 

.0133 

13.3 

11 

999.5 

3.47 

.9997 

.0022 

2.2 

5 


theory has been developed. Here, only two simple illustrations of its 
direct applicability will be given. 

Many college instructors of large classes assign letter grades on 
examinations by means of the normal distribution. The procedure 
followed is to ignore that part of the distribution lying outside of the 
interval m d= 2.5a, or m =t 3a, and then divide this interval into five 
equal parts corresponding to the letter grades F, D, C, B, and A. If 



Fig. 4. Normal curve fitted to histogram. 


fn ± 2.5 or is used, each interval will be a units in length; consequently 
the six values of x determining these five intervals will be m - 2.5a, 
m — 1.5a, m — 0.5a, m + 0.5a, m + 1.5a, and m + 2.5a. The corre- 
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sponding values of t = (x — m)/<r will be —2.5, —1.5, —0.5, 0.5, 1.5, 
and 2.5. From Table II it mil be found that the areas within these 
five intervals are respectively 0.06, 0.24, 0.38, 0.24, and 0.06. Since 
these percentages do not total 100%, it is customary to allow the two 
end intervals to extend to infinity. Then the percentages of students 
who will be assigned the corresponding letter grades are 7% F, 24% D, 
38% C, 24% B, and 7% A. 

As a second illustration, consider the following problem. If skulls 
are classified into three categories, corresponding to a length-breadth 
index being less than 75, between 75 and 80, or greater than 80, and 
if this index is assumed to be normally distributed, determine the 
approximate mean and standard deviation for a set of skulls for which 
58%, 38%, and 4%, respectively, were found in these categories. From 
Table II, the value of t = (75 — m)/a corresponding to an area of 
0.58 to the left of x = 75 is t = 0.20. Similarly, the value of t = 
(80 — m)/cr corresponding to an area of 0.04 to the right of x = 80 
is t = 1.75. The value of m and a may now be determined by solving 
the equations 

75 — m 

-= 0.20 

a 

80 — m 

-= 1.75 

a 

The solution of these equations is m = 74.4 and a = 3.2. 

DISCRETE FREQUENCY DISTRIBUTIONS 

The only discrete variables that will be considered here are those 
which take on non-negative integral values. For example, the variable 
might be the number of heads obtained in tossing 20 coins, or the num¬ 
ber of accidents a car owner has per year. 

1. Moments 

From analogy with (1), Chapter II, the kth moment of a theoretical 
discrete frequency distribution will be defined by 

00 

(16) wt' = J2 xkp{x) 

x*»0 

where P(x) is the probability that the variable takes on the value 
x. It should be clear that this is merely (1), Chapter II, with the sam- 
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pie relative frequency /</» replaced by its theoretical value. Similarly, 
the Ath moment of the variable g(x) is defined by 

oo 

( 17 ) /w =Yj k{x)p{ - x) 

x^O 

Finally, the moment-generating function of g(x) is defined by 

00 

(18) M g (d) = ]T><*>P(x) 

x*0 

For the purpose of verifying that this function does generate moments, 
expand e 9g(x) and sum term by term. Thus 

MM = J2 [i + og(x) + £ Q 2 ^) +■■■ W) 

x^o L Z J 

oo oo ^ 00 

= P(x) + g{x)P(x) + ^ X) g 2 (x)P(x) + • • • 

x =* 0 x = 0 x —0 

e 2 

— M0:« + M l:/0 + M2.g' — + • • ’ 

z 


2. Basic Rules of Probability 

Before considering particular discrete distributions, it is desirable 
to review briefly the basic rules of discrete probability. Most college 
algebra books introduce these rules and apply them to simple games of 
chance. 

If P(A) is the probability that the event A will occur and P(B) is 
the probability that the event B will occur, then the addition rule of 
probability, 

(19) P = P{A) + P(B) 

gives the probability that either A or B will occur, provided that A 
and B are mutually exclusive events. For example, if P(A) = 24 
is the probability that an individual will win a $1.00 prize at a punch 
board and P(B) = 3 i 2 is the probability that he will win a $5.00 prize 
at this same punch board, then P = 24 + 342 = 24 is the probability 
that he will win either a $1.00 prize or a $5.00 prize in a single punch 
at the punch board. 

If P(^4, B ) denotes the probability that both A and B will occur and 
Pa(B) denotes the conditional probability that the event B will occur 
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when A is known to have occurred, then the multiplication rule of 
probability, 

(20) P(A,B) = P(A)P a (B) 

gives the probability that both A and B will occur. As an illustration, 
consider the probability of drawing 2 spades from a deck of cards in 
2 draws. Here both A and B correspond to the event of drawing a 
spade; hence P(A) = %, P A (B ) = %, and P(A, B) « 

If the events A and B are independent, (20) reduces to 

(21) P(A, B) = P(A)P(B) 

As an illustration, consider the preceding problem in which the first 
card drawn is returned to the deck before the second drawing. Then 
P(A) - and P(A, B) = (%) 2 = He- As 

another illustration, if P(A) ~ is the probability that an indi¬ 
vidual will win a prize at a punch board and P(B) = % is the prob¬ 
ability that he will win a prize at a pinball machine, then P — • 34 

= 3^12 is the probability that he will win at both games if he takes 1 
chance at each. 

These three rules of probability suffice for direct derivations of several 
important discrete distributions and for the solution of many impor¬ 
tant practical probability problems. 

As an exercise to develop familiarity with the manipulation of these 
formulas, consider the following problem. An urn contains 2 white 
and 3 black balls, while a second urn contains 4 white balls and I black 
ball. If an urn is selected at random and a single ball is drawn, what 
is the probability that it will be white? The probability that the first 
urn will be selected is in which event the probability of drawing a 
white ball is %; therefore by (20) the probability of both events occur¬ 
ring is The probability that the second urn will be 

selected is likewise 3^, in which event the probability of drawing a 
white ball is therefore the probability of both of these events 
occurring is = 2 5 - Since these two possibilities constitute 

mutually exclusive events, by (19) the probability of one or the other 
of these possibilities occurring is \£ + % = %, which is therefore the 
desired probability. 

3. Binomial Distribution 

Let p be the probability that an event will occur at a single trial, 
and let q = 1 — p denote the probability that it will fail to occur. If 
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the event occurs at a given trial, it will be called a success, otherwise a 
failure. Let n independent trials be made, and denote by x the number 
of successes observed in the n trials. Then consider the problem of 
determining the probability of obtaining precisely x successes in n 
trials. 

First, determine the probability of obtaining x consecutive successes, 
followed by n — x consecutive failures. These n events are independ¬ 
ent; therefore by (21) this probability is 

x n—x 

p-V'P-q-q---q = p x q n ~~ x 

The probability of obtaining precisely x successes and n—x failures 
in some other order of occurrence is the same as for this particular 
order because the p’s and q’s are merely rearranged to correspond to 
the other order. In order to solve the problem, it is therefore necessary 
to count the number of such orders. 

The number of orders is the number of permutations possible with n 
letters of which x are alike (p's) and the remaining n — x are alike 
(q’s). Now a familiar college algebra formula states that the number 
of permutations of n things of which are alike, n 2 are alike, * * *, and 
nk are alike is given by 

n\ 

( 22 ) - 

tti!n 2 !* • -n k l 


A direct application of this formula shows that the number of permuta¬ 
tions of the p’s and q’s is equal to 


(23) 


n! 

x\(n — x)! 


Now by (19) the probability that one or the other of a set of mutually 
exclusive events will occur is the sum of their separate probabilities; 
consequently it is necessary to add p x q n ~ x as many times as there are 
different orders in which the desired result can occur. Since (23) gives 
the number of such orders, it follows that the probability of obtaining 
x successes in n independent trials of an event, for which p is the 
probability of success in a single trial, is given by 

(24) P(x) . . V x q n ~ x 

x\(n — 

This function is called the binomial or Bernoulli distribution func¬ 
tion. The name binomial comes from the relationship of (24) to the 
following binomial expansion: 
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, n(n — 1) 0 . 

(g + P) n = q n + n<f V H--- <f V H-f p n 

2t 


(25) 


n 

-£ 


n\ 


x\(n — x)l 


p x q n x 


From (24) it is clear that (25) may be written 


n 

(2 + V) n =J2 P(X) 


Thus the terms in this binomial expansion give the probabilities of the 
various possible results in their natural order. 

The binomial distribution can be used to solve many practical prob¬ 
lems related to repeated trials of an event. Such problems will be 
considered later; however, to illustrate the nature of formula (24), 
consider two simple problems related to the rolling of a die. If a true 
die is rolled 5 times, what is the probability that precisely 2 of the rolls 
will show ones? Here success consists of obtaining a one; hence 
P — Q = and n = 5. When (24) is applied, the solution is 


P(2) 


—(-Yi 

2!3! W 



0.16 


If the die is rolled 5 times, what is the probability of obtaining at most 
2 ones? To answer this question it is necessary to compute the proba¬ 
bilities of obtaining precisely 0 ones, 1 one, and 2 ones. Applying (24), 


P( 0 ) = 


P( 1 ) = 


5! 

0!5! \6/ V6 


5! /I 


, i , 


1!4! W \6 


0.40 

0.40 


Since these three possibilities are mutually exclusive events, formula 
(19) may be applied to give 

P(x < 2) = P(0) + P(l) + P(2) = 0.96 

Binomial moments. If the number of trials, n, of an event is large, 
the computation of probabilities by means of (24) becomes burden¬ 
some. Since most practical problems related to repeated trials of an 
event involve a large number of trials, it is important to find fast 
approximate methods for computing such probabilities. Fortunately, 
the histogram representing the binomial distribution can be approxi¬ 
mated very well by means of the proper normal curve when n is large; 



40 THEORETICAL FREQUENCY DISTRIBUTIONS OF ONE VARIABLE 


consequently normal curve methods can be employed to calculate 
these probabilities. Before investigating this property in general, 
consider a numerical example. 

Determine the graph of the binomial distribution for which p = yi 
and n = 12. This is hardly a large value of n, so that a good normal 
approximation is not to be expected here. Since P(x) is to be computed 
for all values of x from 0 to 12, it is easier to compute each value, after 



the first, from the preceding one rather than to compute each value 
by itself. Here, by (24), 


P(x) - 


12 ! 

x\(l2 — a)! 


Gxr 


It is easily verified that 


P(k + 1) = 


12 — k 1 

-P(fc) 

k + 12 


After P(0) was computed, this relationship was used to obtain the 
following values: 


P(0) = 0.007707 

P(l) = 6P(0) = 0.046242 

P( 2) = J^P( 1) = 0.127166 
(26) P( 3) = fP(2) = 0.211943 
P(4) = fP(3) = 0.238436 
P(5) = |P(4) = 0.190749 
P(6) = ^P(5).= 0.111270 


P(7) = fP(6) = 0.047687 
P(8) = &P( 7) = 0.014902 
P(9) = |-P(8) = 0.003312 
P(10) = -&P(9) = 0.000497 
P(ll) = t^-P( 10) = 0.000045 
P(12) = u*tP(ll) = 0.000002 
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Since P(0) was computed correct to four digits only, the remaining 
values would not be expected to be correct to more than four digits, 
even though they have been recorded to six decimals for the sake of 
appearances. The graph of this binomial distribution is shown in Fig. 
5. It appears that this histogram could be fitted quite well by the 
proper normal curve. 

In order to determine what normal curve should be used to fit any 
given binomial-distribution histogram, it is necessary to determine the 
mean and standard deviation of the general binomial distribution. 
This will be accomplished by means of its moment-generating function. 
When definition (18) is applied to (24), 


MJfi) = 

x~Q 


n\ 

x\(n — a;)! 


rr 


n 



n\ 

x\{n — x)l 


(; pe e ) x q n ~ x 


But from (25) this sum can be written as a binomial raised to the nth 
power because the expansion is purely algebraic and need not be inter¬ 
preted in terms of probabilities. Hence 

(27) M x (6) = (q + pe e ) n 

The desired moments may be obtained by applying (6). If (27) is 
differentiated twice, 

M'(0) = npe e (q + pe 9 )”' 1 

and 

M"(0) = npe e (q + pe e ) n ~ 2 (q + npe e ) 

The values of these derivatives at d — 0 are up and npq + n 2 p 2 , 
respectively; hence these are the values of m' and n 2 ', respectively. 
From formula (9), Chapter II, and these results, it follows that p 2 
= (12 — Pi 2 — npq; consequently the general binomial distribution 
has its mean and standard deviation given by the formulas 


(28) 


m — np 



Thus, if a normal curve is to be fitted to a binomial histogram, it is 
merely necessary to use formulas (28) and the technique previously 
illustrated i^^able 1 


wm*5 !! n £££> 
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If the third and fourth derivatives are evaluated at 0 = 0 and 
formula (9), Chapter II, is applied to give moments about the mean, it 
will be found that 


(29) 



a 4 


3 + 


1 — 6pq 

~npq~~ 


From these formulas it is clear that the skewness of any binomial 
distribution as measured by a 3 approaches zero with increasing n, 
and that its peakedness as measured by approaches that of a normal 
distribution. Although these are indications that the normal approxi¬ 
mation will be good for large n, it is necessary to show that all the 
moments of the binomial distribution approach those of some normal 
distribution before one can be certain of the fact. A demonstration 
of this property will be given in the next section. 

As an illustration of how to use formulas (28) and normal curve 
methods for approximating probabilities of repeated trials of an event, 
consider a die problem related to Fig. 5. If a die is rolled 12 times arid 
the appearance of either a one or a two is classified as a success, what 
is the probability of obtaining at least 6 successes? Since p = and 
n = 12 here, the exact answer correct to three decimals is given by 
adding the probabilities in (20) from P(6) through P(12). Hence, 

P(x > 6) = 0.178 

Geometrically, this answer is the area of that part of the histogram 
in Fig. 5 lying to the right of x = 5.5. Therefore, to approximate this 
probability by normal curve methods, it is merely necessary to find the 
area under that part of the fitted normal curve which lies to the right 
of 5.5. If (28) is applied, 


Consequently, 



But from Table II the area to the right of t = 0.92 is 0.179, which, 
compared to the correct value of 0.178, is in error by only about 
0.5%. 

To test the accuracy of normal curve methods over a shorter interval, 
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consider the probability of obtaining precisely 6 successes in the 12 rolls 
of the die. From (26) the answer correct to three decimals is 

PC6) = 0.111 

To approximate this answer, it is merely necessary to find the area 
under the fitted normal curve between x = 5.5 and x = 6.5. Thus, 

6.5 - 4 

t 2 -- = 1.53, A 2 = 0.4370 

1.63 

5.5 - 4 

h =-= 0.92, At = 0.3212 

1.63 

Therefore the required area is 0.116, which is in error by about 5%. 
From these two examples it appears that normal curve methods are 
quite accurate, even for some situations such as the one considered 
here in which n is not very large. 

Normal curve approximation. Thus far the fact that the binomial 
distribution can be approximated well for large n by the normal dis¬ 
tribution with m — np and a = 'Vnpq has been made plausible by 
numerical examples and by inspecting a 3 and a±. Now consider the 
verification of this fact by means of the moment-generating function. 
Here it is convenient to use the variable 


x — np x — m 
v npq ° 


From properties (9) and (10), and (27), it follows that 


Mt(9) — ATx—rn 



(30) 



md 0 


= c ' (q + peT 

Taking the logarithm of both sides to the base e gives 

md - 

log M t (e) =-b n log ( q + pc a ) 


Expanding e* and replacing q + p by 1 yields 


md 

log Mt(6) - -(- n log] 1 + p 

cr 


i /ff 


L\<T 


2! \<r 




1 (ff 


3! \<r 


+ ■ 
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The logarithm on the right may be treated as of the form log {I + z ). 
If | 6 | is chosen sufficiently small, the expansion 

z 2 z 3 2 4 

log {1 + 2 } = z -1-i- 

2 3 4 

may be applied to give 
m3 

(31) log M t (fi) = - 

-HCKQ'- KEKQ+ b 

Collecting terms in powers of 6 gives 

( m np\ /p p 2 \ 6 2 

-7 + -) e+n h--?h + - 

But, since np — m and <j 2 = npq, the coefficient of 6 vanishes and the 
coefficient of 9 2 / 2! reduces to 1; consequently 

e 2 

log M t (9) — -[- terms in 6 h , k = 3, 4, • • • 

2 


From an inspection of (31), which shows how terms in 6 k arise, it is 
clear that all terms in d k contain n/a k as a common factor. The other 
factor for each such term is a constant times a power of p. Since this 
other factor does not involve n and since 


n __ n 

k h 

(npq) 2 

with k ^ 3 here, all such terms will approach zero as n becomes 
infinite. This implies that 

e 2 

lim log M t (d ) = — 

n —> 00 2 

which in turn implies that 

e 2 

(32) Urn M t (6) = c 2 

n —♦ co 

A justification of the above expansions and limits would require a 
knowledge of advanced calculus methods and therefore will not be 
considered here. 

Now compare (32) and the normal moment-generating function given 
by (12). From the discussion immediately following (12), it is clear 
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that for a normal variable x f 

(33) M x „ m (0) = e 2 

Since the first equality of (30) holds for any variable, it may be applied 
to (33) to give 

O 0 2 

A comparison of this result and (32) shows that the binomial variable 
t = (x — np)/y/npq has a moment-generating function which ap¬ 
proaches the moment-generating function of the normal variable 
whose mean is zero and whose standard deviation is 1. This implies 
that all the moments of this binomial variable approach those of the 
standard normal variable. 

In order to complete this discussion, it is necessary to introduce two 

very important theorems of advanced theoretical statistics. The first 

theorem states that a distribution function is uniquely determined by 

its moment-generating function. For example, if the moment- 

e 2 

generating function of a variable z is known to be e 2 y then z must 
be the standard normal variable. The second theorem states that, 
if one distribution function has a moment-generating function which 
approaches the moment-generating function of a second distribution 
function, then the first distribution function approaches the second 
distribution function. This theorem insures that the binomial variable 
being studied approaches the standard normal variable, because by 
(32) its moment-generating function approaches the moment-generating 
function of the standard normal variable. A precise statement of these 
two theorems, including conditions when they hold, will not be made 
here; nevertheless frequent use will be made of them. A direct applica¬ 
tion of these theorems to (32) yields the following theorem. 

Theorem I. If x represents the number of successes in n independent 
trials of an event for which p is the probability of success in a single 
trial , then the variable (x — np) /\/npq has a distribution which ap¬ 
proaches the normal distribution with mean zero and standard deviation 
1 as the number of trials becomes increasingly large . 

This theorem justifies the previous use of normal curve methods for 
approximating probabilities related to successive trials of an event 
when n is large. Experience indicates that the approximation is fairly 
good as long as, for p ^ np > 5 at least. Obviously, a very 
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small value of p together with a moderately large value of n would 
yield a small mean and thus produce a skewed distribution; hence the 
necessity for including p in this empirical rule. Figures G an d 7 
indicate how rapidly the distribution of the variable (x — np)/'\/npq 
approaches normality when p = l A and n = 24 and 48, respec¬ 
tively. The common y scale for these two graphs is approximately 
17 times that for the x axis. 

There are numerous occasions when it is more convenient to work 
with the percentage of successes in n trials than with the actual number 
of successes. Since 


x 



it follows as a corollary of Theorem I that the percentage of successes, 
x/n, is approximately normally distributed with mean p and standard 
deviation V" pq/n, provided that n is sufficiently large. The word 
percentage will be used here and later to mean the decimal ratio. 

Applications. Certain types of practical problems dealing with per¬ 
centages can be solved by means of this normal approximation. As 
a first illustration, consider the following simple genetics problem. 
According to Mendelian inheritance, certain crosses of peas should 
give yellow and green peas in the ratio of 3:1. In an experiment, 
176 yellow and 48 green peas were obtained. Do these results conform 
to theory? 

Here the 224 peas may be treated as 224 trials of an event for which 
the probability of obtaining a yellow pea in a single trial is %. Then 

p — •§•, n — 224, m — np — 1G8, a = \^npq — G.5 

From the experimenter’s point of view, an experiment corroborates 
theory if its results are sufficiently close to expectation. In this prob¬ 
lem it is therefore a question of deciding whether 17G is sufficiently 
close to 1G8. Since poor experimental results correspond to large 
deviations from the mean, whether positive or negative, it is a question 
of how large a deviation numerically should be tolerated before the 
experiment will be judged as not conforming to theory. It is customary 
for many statisticians to determine a critical deviation such that the 
probability is 0.05 (or 0.01) of obtaining a deviation larger numerically 
than the critical deviation and to declare an experimental result as 
not conforming unless its deviation is less numerically than the critical 
one. In order to determine the 0.05 critical deviation, it would be 
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necessary to calculate and sum the probabilities of the binomial 
distribution corresponding to this problem beginning at the mean and 
expanding symmetrically until a total probability of 0.95 had been 



Fia. 6. Binomial distribution of (x — np)/^/npq for p — x /i and n 24. 

obtained. However, this critical deviation can be determined approxi¬ 
mately very easily by using the normal approximation to this binomial 
distribution. Because | t | = 2 corresponds to an interval of 2 standard 



Fig. 7. Binomial distribution of ( x — np)/\^npq for p = K and n = 48. 

deviations on both sides of the mean and this interval includes about 
95% of the normal curve area, it is customary to choose | 1 1 = 2 as 
the critical value of | ^ | rather than the more accurate Table II value 
of j 21 = 1.96. For this problem, therefore, an experimental result 
would be declared to conform to theory if it fell within the interval 


iu ± 2 <t = 168 dt 13 
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Since 176 falls well within this interval, there is no reason on this basis 
for doubting that Mendelian inheritance is operating here. 

As a second illustration, consider the following problem. From past 
experience the manufacturer of parts finds that when a machine is 
functioning properly 5% of the parts are defective on the average. 
During the course of a day's operation by a new operator, 400 parts are 
turned out, 30 of which are defective. Is the operator satisfactory? 

The answer to this question depends upon what is meant by the 
word satisfactory. Here it will be assumed that satisfactory means 
that the number of defective parts should not be greater than what 
could be reasonably attributed to chance for a normal operator. If 
this operator is considered to be a normal operator, the 400 parts may 
be thought of as 400 trials of an event for which the probability of 
obtaining a defective part in a single trial is 0.05; hence 

p — 0.05, n = 400, m — np ~ 20, a ~ V npq = 4.36 

By means of the binomial distribution corresponding to this problem, 
it is possible to calculate the probability of obtaining 30 or more defec¬ 
tive parts. This probability could be obtained by using (24) to calculate 
the successive probabilities of obtaining 30, 31, •**, 400 defectives 
and then adding these probabilities. It is much easier, however, to 
approximate the sum of these probabilities by finding the area to the 
right of 29.5 under the approximating normal curve. Here 

x — m 29.5 — 20 

t =-=-= 2.18 

o- 4.36 

From Table II the area to the right of t — 2.18 is 0.015; consequently 
the probability is approximately 0.015 that a normal operator will turn 
out 30 or more defective parts in a lot of 400. Now this one day's 
experience may be thought of as but one of an indefinite sequence of 
similar day's experiences for normal operators. This result may there¬ 
fore be interpreted by stating that a normal operator would have a day 
as bad or worse than this only about 3 days in every 200 on the average. 
From the manufacturer's point of view, this operator has undoubtedly 
turned out more defective parts than can be reasonably attributed to 
chance; consequently he would be judged unsatisfactory from this 
point of view. 

The reasonableness of this decision depends upon the extent to which 
the mathematical model used here represents the actual situation. If 
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the successive parts turned out by normal operators do not behave 
like random samples from a binomial population, then one is not 
justified in applying these methods. It might happen as it often does, 
for example, that the variability of normal operators is much larger 
than that given by <r = Vnpg, or that the percentage of defective 
parts varies with the day of the week or the condition of the machine. 

As a third illustration, consider the problem just alluded to of deter¬ 
mining whether daily percentages of defectives may be treated as 
random samples from a binomial population. Industrial experience 
has shown that most production processes do not behave in this 
idealized manner and that much valuable information is obtained 
concerning the process if the order in which data are obtained is 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 

Fig. 8. Control chart for fraction defective. 


preserved. A simple graphical method, called a quality control chart, 
has been found highly useful for assisting in the solution of this problem. 
Such a chart for the percentage of defectives is illustrated in Fig. 8. 
The middle line is thought of as corresponding to the process per¬ 
centage defective, although it is usually merely the mean of past 
daily percentages. The other two lines serve as control limits for daily 
percentages of defectives. From (34) it will be observed that these 
two control lines are spaced three standard deviations from the mean 
line. Along the x axis are recorded the time units for successive sam¬ 
ples. If now the production process behaves in the idealized manner 
and if the normal approximation to the binomial distribution may be 
used, the probability that a daily percentage point when plotted on 
this chart will fall outside the control band is approximately equal to 
the probability that a normal variable will assume a value more than 
three standard deviations away from its mean, which from Table II 
is 0.003. Because of this small probability, it is reasonable to assume 
that the production process is no longer behaving properly when a 
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point falls outside of the control band; consequently the production 
engineer checks over the various steps in the process when this event 
occurs. From an inspection of Fig. 8, it will be observed that the 
process in question went out of control on the twelfth day. 

Industrial experience shows that only rarely does a production proc¬ 
ess behave in this idealized manner when the control-chart technique is 
first applied. Nevertheless, the technique is highly useful because it 
enables one to discover causes of a lack of control and thus to improve 
on the production process until gradually statistical control has been 
obtained. 

This illustration and discussion of a quality control chart gives a 
very incomplete picture of how quality control methods operate. Such 
methods constitute an extensive field of applied statistics, and numerous 
articles and books concerning them are available. 


4. Poisson Distribution 

When p is very small, even though n is large, the normal approxima¬ 
tion to the binomial distribution may be poor; consequently some other 
form of approximation is needed. The empirical rule that was sug¬ 
gested just after Theorem I implies that a new form of approximation 
is needed when np < 5. Such an approximation exists in the form of 
the Poisson distribution function, which is defined by 


(35) 


PM = 


m 


xi 


Poisson approximation to the binomial. To verify the fact that 
(35) does serve as a good approximation to the binomial for very small 
p and very large n, consider the binomial distribution as p approaches 
zero and n becomes infinite in such a manner that m = np remains 
fixed. For this purpose it is convenient to compare the moment¬ 
generating functions of these two distributions. 

From (18) and (35) the moment-generating function of the Poisson 
distribution is given by 


MM =2 
£ “ 0 


xl 


— c 


Tp (me 9 )* 

Jbmmmad f 

x = 0 Xl 


But this last sum is merely the expansion of e m€ consequently 
(36) MM = e m(e *- 1) 


Now the moment-generating function of the binomial distribution 
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is given by (27). Let it be denoted by M x '(6 ); it may be manipulated 
as follows into a form resembling (36). 

M x '(d) = [q + pe 9 ] n 

_ e n log fe+pe®] 

_ n log [X + p(e fl — 1)] 

— C I 

1 

_ e np(e d - 1) log fl+p(#’ 0 —^ 


(37) _ e m{eO- 1) log ll+p(e9-l)f (e6 - 1) 

Since e may be defined by 

i 

lim [1 + z] z — e 
2—0 

it will be observed that the expression in (37) whose logarithm is 
being taken will play a similar role as p approaches zero. Thus, 

i 

lim [1 + p(e e - 1 )]*<«•-D * e 

p — o 

Consequently, 

i 

(38) lim log [l + p(c e - 1 )]!>(«"-1) = 1 

p — 0 

Since np — m is being held fixed, it is clear from (37), (38), and (36) 
that 

lim M x '{0) = M x (fi) 

p — o 

By the same argument as that given before Theorem I, this limit 
implies that the binomial distribution approaches the Poisson distribu¬ 
tion under the given conditions. This demonstrates the following 
theorem. 

Theorem II. If the probability of success in a single trial , p, approaches 
zero while the number of trials , n, becomes infinite in such a manner 
that np — m remains fixed, the binomial distribution approaches the 
Poisson distribution . 

Figures 9 and 10 indicate how rapidly the binomial distribution 
approaches the Poisson distribution. The dotted lines represent the 
fixed Poisson histogram for m = 4, and the solid lines the binomial 
histogram for p = and p = 3^24, respectively. 
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Fig. 9. Binomial (—) and Poisson (—) distributions for m = 4 and p =» 



Fig. 10. Binomial (—) and Poisson (—) distributions for m = 4 and p =» ^ 4 . 

Applications. As an illustration of a distribution that may be thought 
of as possessing Poisson characteristics, consider the data of Table 2 
on the distribution of yeast cells in the 400 squares of a hemacytometer. 


TABLE 2 


No. cells ( x ) per square 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Observed frequency 

103 

143 

98 

42 

8 

4 

2 

0 

0 

0 

0 

Expected frequency 

107 

i 

141 

93 

41 

1 

14 

4 

i 

1 

0 

0 

0 

0 
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The procedure for obtaining the observed frequencies consists in 
diluting the yeast cells in a liquid, thoroughly mixing the dilution, 
filling a counting chamber that has been ruled into 400 squares with 
the mixture, and then counting the number of yeast cells on each square 
under a microscope. If the mixture is thought of as consisting of yeast 
cells and groups of molecules of the liquid about equal in size to the 
yeast cells, the yeast cells will constitute only a very small percentage 
of such units of volume; nevertheless the total number of such units 
on one square of the hemacytometer is so large that several yeast cells 
may be found among them. The number of trials here corresponds to 
the total number of units on a square, and the number of successes 
corresponds to the number of yeast cells on the square. If the mixing 
has been thorough, one would expect the yeast cells to be distributed 
at random in the mixture and the units on a square to constitute a set 
of independent trials. 

The mean of x for Table 2 will be found to be x = 1.32. If it is 
assumed on the basis of the preceding discussion that x possesses 
a Poisson distribution and if the value of m is approximated well by 
x, the theoretical or expected frequencies may be obtained from (35) 
by computing the successive values of 

c“ 132 (1.32) x 

400---- 

x\ 

These frequencies are readily computed by computing each frequency, 
after the first, from the preceding frequency. The results of these 
computations correct to the nearest unit are given in Table 2. There 
appears to be excellent agreement here. 

If there had been poor agreement between the observed and expected 
frequencies, it might have been caused by the lack of realism in the 
mathematical model, or by the lack of randomness in the distribution 
of yeast cells because of poor experimental technique. It is customary 
in work of this type to use the lack of agreement as evidence of poor 
technique rather than to question the reality of the Poisson assumption. 

5. Multinomial Distribution 

The binomial distribution is applicable to a discrete variable that 
assumes only one or the other of two values. For situations in which 
more than two values are possible and desirable, a generalization of 
the binomial distribution is needed. This generalization is known as the 
multinomial distribution. It will be of particular value in later theory. 
Strictly speaking, the multinomial distribution is a distribution func- 
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tion of several variables; nevertheless it will be included here because 
of its intimate relation to the binomial distribution and because it 
arises from considering a single variable which can assume only a dis¬ 
crete set of values. 

Consider a discrete variable x which at any trial of the event can 
assume one and only one of the values x lf x 2 , • •, x Let the prob¬ 

ability that x will assume the value x t be denoted by p t . Then, in n 
trials of the event, the probability that Xi will be assumed n x times 

k 

(i — 1, 2, • • •, k ), where = n > can obtained in the following 

i 


manner. 

Consider the particular sequence of events given by 


ni 712 71* 

%1) * ' y %2> ' y *2, ’ j %k 


From (21) it follows that the probability of obtaining this particular 
order of events is 

„ 711 712 _ 71 * 

Pi V2 * * ‘Pk 


Now every arrangement of this set of x’s has this same probability of 
occurring and satisfies the conditions of the problem; consequently it 
is necessary to count the number of such arrangements. But this is 
merely the number of permutations of n things of which U\ are alike, 
n 2 are alike, etc., which by (22) is 

n\ 


ni\n 2 \• • *ftjk 


It therefore follows from (19) that the probability of obtaining rii aq’s, 
n 2 x 2 ’$ } etc., is 


(39) 


n! 


ni\n 2 \- 


•n k \ 


Til 712 

Pi P2 * 


'Pk 


k 

where '^T^n l = n. This expression is called the multinomial distribu- 

l 

tion function. 

The name arises from the fact that (39) represents the general term 
in the expansion of the multinomial 


(40) 


(Pi + P2 + * * * + PkT 


just as the binomial distribution function represents the general term 
in the expansion of (q + p) n . In order to verify this relationship, it is 
merely necessary to expand the multinomial step by step as a binomial. 
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Thus, when (40) is written in the form ( p\ + qi) n and expanded, it 
will be observed that the only term involving p\ ni is the term 


nl 


ni\(n — ni)\ 


Pi ni qi n ~ ni 


Then if qi n ni = (p 2 + q^f ni is expanded, it will be observed that the 
only term involving p 2 nz is the term 


in - ni)! 


n 2 l(n — fti — n 2 )\ 


ri2 n —nj — 

P 2 q2 


If this procedure is continued and the resulting terms are combined, 
the expression (39) will be obtained. 

As an illustration of the multinomial distribution, consider the follow¬ 
ing problem. The diameters of 10 sample parts were measured with 
a precise measuring instrument. Of these 10 measurements, 3 ended 
with a zero, 2 ended with a 5, and 5 ended with some other integer. 
What is the probability (a) of obtaining a set of end digits like this 
if the integers from 0 to 9 are equally likely to occur as end digits in 
measurements of this kind, ( b ) of obtaining the expected set consisting 
of 1 zero, 1 five, and 8 other integers? 

(a) Here pi = Ho, p 2 = Ho, Ps = Ho, »x = 3 , n 2 = 2, n 3 = 5; 
consequently by (39) the desired probability is 


10! / 1 \ 3 / 1 V 2 / 8 \ 5 

3!2!5! W W \10/ 


0.0083 


(b) Here p x = Mo, P 2 = Mo, Ps = Mo, n 1 = 1 , n 2 = 1, n 3 = 8; 
consequently the desired probability is 


10 ! 

Ims! 



0.1510 


The practical interpretation of probabilities like these will be considered 
in later chapters; however, thiere appears to be evidence in these two 
probabilities that the individual taking the measurements was careless 
in the use of the instrument. 


THEORETICAL DISTRIBUTIONS FOR TESTING HYPOTHESES 

The theoretical distributions that have been studied in this chapter 
may serve as the basis for testing certain types of statistical hypotheses, 
which in turn may be used to solve certain types of problems. The 
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second of the three problems considered in the section on applications 
of the binomial distribution illustrates one problem of this kind. In 
that problem the decision as to whether the new operator was to be 
judged satisfactory was made to depend upon the binomial distribution 
and a probability calculated on the assumption that p = 0.05 held for 
the new operator. 

1. Nature of Statistical Hypotheses 

A statistical hypothesis is usually an assumption about a population 
parameter. For example, in testing the “honesty” of a coin, the 
hypothesis might be the assumption that p — for the binomial 
population involved. For the problem concerning the machine opera¬ 
tor, the hypothesis might be the assumption that p = 0.05. It is not 
customary to incorporate any assumption concerning the form of the 
population distribution function in the hypothesis. The form of this 
function is assumed known from other considerations. For example, in 
coin tossing, it is clear that the binomial distribution would be expected 
to hold. For the machine operator, however, one could not be certain 
without a satisfactory check that the successive parts turned out by 
the operator behave like independent trials of an event for which the 
probability of success is constant. The control-chart technique was 
introduced as a method for checking such assumptions. 

^2/Tests of Statistical Hypotheses 

A test of a statistical hypothesis is a procedure for deciding whether 
to accept or reject the hypothesis. For example, in the problem of the 
machine operator, the procedure consisted in calculating the probability 
that a normal operator would turn out 30 or more defective parts and 
then making a decision on the basis of this probability. When such a 
probability has been calculated and it turns out to be very small, two 
explanations are possible. Either the hypothesis and its related 
assumptions are false, or else a rare event occurred. It is customary 
to choose the first of these two alternatives whenever the probability 
involved is less than a fixed value a called the significance level of the 
test. In this book the value of a = 0.05 will ordinarily be selected. 
If the probability in question turns out to be less than a , the result is 
said to be significant. Thus, the hypothesis being tested will be 
rejected whenever a significant result is obtained. If an individual 
follows this rule of procedure with a = 0.05, say, he will incorrectly 
reject true hypotheses only 5% of the time on the average. If a smaller 
error of this type is desired, a smaller value of a may be selected; 
however, if a is made very small there usually arises the danger that 
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false hypotheses will then be accepted a large percentage of the time. 

The principal difficulty in testing hypotheses is knowing what 
probability should be calculated for making a decision. In some 
problems, such as the one just discussed, a careful consideration of 
its practical implications will suggest what hypothesis to test and what 
probability to calculate in testing it. It does not follow, however, 
that a test which intuitively appears to be satisfactory is necessarily 
efficient or even correct. A logical approach to this whole problem of 
testing hypotheses will be discussed in Chapter XI. In the chapters 
preceding Chapter XI, numerous tests will be presented and applied, 
largely on an intuitive basis; however, a large percentage of these tests 
have been shown to be highly efficient from the point of view of 
Chapter XI. 
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EXERCISES 


1. Given that /(x) =» 1, 0 ^ x ^ 1, find (a) Mfc' by integration, ( b ) M x (0), 

( c ) Mfc' from M x (0). 

2. Given that/(x) = cx, 0 <. x ^ 1, find (a) c, ( b) hk' by integration, (c) M x (0), 

(d) wf from M x (0). 


3. Find the approximate value of the 


integral 




£2 

2 dx by using ^ 2 -unit 


intervals and Simpson’s rule for numerical integration, and integrating from 0 to 3. 

4. Find fx/c for the normal distribution by using the integral definition and re¬ 
peated integrations by parts. 

6 . Fit a normal curve to the histogram for the data of problem 4, Chapter II. 

6 . What is the probability of rolling a total of less than 7 with 2 dice? 

7. Compare the chances of rolling a 4 with one die and rolling a total of 8 with 
2 dice. 

8. A, B, and C in order toss a coin. The first one to throw a head wins. What 
are their respective chances of winning? Note that the game may continue in¬ 
definitely. 

9. Fourteen quarters and 1 fivc-dollar gold piece are in one purse, and 15 quar¬ 
ters are in another. Ten coins are taken from the first and placed in the second, 
and then 10 corns are taken from the second and placed in the first. Which purse 
would you choose, and how much better off would you be? 

10 . Eight dice are rolled. Calling a 5 or 6 a success, find the probability of 
getting (a) 3 successes, ( b) at most 3 successes. 

11 . How many throws with 2 dice will be required in order that the probability 
of getting a double 6 at least once will have the value }/£, approximately? 

12 . Find m and <x for the binomial distribution by using the definition of momentr 
and the fact that 2x 2 P(x) = 2x(x — l)P(x) -f- 2xP(x). 

13 . A coin is tossed 12 times. Find the probability, both exactly and by the 
normal curve approximation, of getting (a) 4 successes, ( b ) at most 4 successes. 

14 . A die is tossed 12 times. Counting a 5 or 6 as a success, what is the proba¬ 
bility, both exactly and approximately, of getting (a) 4 successes, ( b ) at most 4 
successes? 
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15 . A die is tossed 90 times. Find the probability of getting 15 aces (a) using 
the formula and tables of factorials, ( b ) using the normal curve approximation. 

16 . Experience shows that 20% of a certain kind of seed germinates. If 50 out 
of 400 seeds germinated, would an explanation be needed? 

17 . A coin is tossed 400 times. Would 215 heads be a reasonable result? 

18 . About 9% of the population is between 20 and 24. A city of 12,000 has 
1,300 in this age group. Test for reasonableness, and comment. 

19 . A manufacturer has found from experience that 3% of his product is rejected 
because of flaws. A new lot of 800 units comes up for inspection, (a) How many 
units would reasonably be expected to be rejected? (b) What is the approximate 
probability that less than 30 units will be rejected? 

20. A life-insurance company has 1,000 policies averaging $2,000 on lives at 
age 25. From a mortality table it is foimd that, of 89,032 alive at 25, 88,314 are 
alive at 26. Find upper and lower values for the amount which the company 
would reasonably be expected to pay out during the year on these policies. 

21. If people did not change their views on a candidate for a period of a week 
just before and including the election, approximately how large a sample would 
you need to take at the beginning of that week in order to be able to predict, with 
a probability of 0.95, the true percentage of votes to be cast for the candidate with 
an error of less than 1% if the true percentage is 50%? What further assumption 
are you making concerning expressing opinions and voting? 

22 . Roll a die 6 times. Call a 1 or 2 a success. Record the number of successes 
for each of 30 such experiments, (a) Find the approximate probability of obtain¬ 
ing a total number of successes further removed from expectation than this. ( 6 ) 
Apply binomial distribution theory to find the expected frequencies of successes for 
p =ss % and n — 6 . 

23. The following data are for the number of seeds germinating out of 10 on 
damp filter paper for 80 sets of seeds. Fit a binomial distribution to these data. 


X 

0 

1 

2 

3 

4 

1 

5 

6 

l 

7 

8 

9 

10 

! 

6 

20 

28 

12 

8 

6 

0 

0 

0 

0 

, 

0 


24. In the manufacturing of parts the following data were obtained for the daily 
percentage defective for a production averaging 1,000 parts a day. Construct a 
control chart, and indicate times when production was out of control. 


2.2, 

2.3, 

2.1, 

1.7, 

3 . 8 , 

2.5, 

2 . 0 , 

1 . 6 , 

1 . 4 , 

2.6, 

1.5, 

2.8, 

2.9, 

2.6, 

2 . 5 , 

2.6, 

3 . 2 , 

4 . 6 , 

3 . 3 , 

3.0, 

3.1, 

4.3, 

1.8, 

2.6, 

2 . 1 , 

2.2, 

1 . 8 , 

2 . 4 , 

2 . 4 , 

1.6, 

17, 

1.6, 

2.8, 

3.2, 

1 . 8 , 

2.6, 

3 . 6 , 

4 . 2 . 



26. For 

n = 12 and p 

— %> plot on 

the same 

piece of graph 

paper the 

(a) bi' 


nomial histogram, ( b ) Poisson histogram, (c) fitted normal curve by ordinates. 
Note the extent to which ( 6 ) and (c) approximate the binomial. 

26. Find m, <r, az, and <24 for the Poisson distribution by using the values for the 
binomial distribution and allowing n to become infinite with m fixed. On the 
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basis of these results, when would you expect the Poisson distribution to be nearly 
normal? 

27. Fit a Poisson function to the following data on the number of deaths from 
the kick of a horse per army corps per year for 10 Prussian Army Corps for 20 years. 


X 

0 

1 

2 

3 

4 

f 

109 

65 

22 

3 

i 


28. One cubic centimeter of a liquid suspected of containing a high density of 
bacteria is diluted until the density is Ho of its previous value. Ten test tubes of 
a nutrient material are inoculated with this dilution, (a) If the original density 
was 30 bacteria per cubic centimeter, what is the probability that all 10 of the test 
tubes will show growth, that is, contain at least 1 bacterium each? ( h ) What is 
the probability that exactly 7 test tubes will show growth? 

29. What is the probability in 12 rolls of a die that each side will come up twice? 
Find some other possible result, if any, which has a better chance of occurring. 



CHAPTER IV 


LARGE-SAMPLE THEORY OF ONE VARIABLE 

FREQUENCY DISTRIBUTIONS OF MORE THAN ONE VARIABLE 

1. Properties 

Let Xi, x 2 , • * •, x n be n variables whose joint distribution is of inter¬ 
est. For example, if four different types of tests were being studied, 
four variables might be the scores of individuals on those tests and it 
would be of interest to know how abilities in these four subjects were 
interrelated and distributed. As another example, it might be of inter¬ 
est to investigate the interrelationship and distribution of the three 
variables hardness, ductility, and malleability, if properties of metals 
were being studied. The generalization of a distribution function of 
one variable to one of several variables is made easier if one thinks in 
terms of two variables because then the geometrical interpretation is 
simple. Thus, a distribution function of two variables, x and y, would 
be denoted by f(x, y) and would be represented by a surface in three 
dimensions, just as a distribution function of one variable, /(x), was 
represented by a curve in two dimensions. Similarly, a distribution 
function of n variables would be denoted by f(x i, x 2 , • * *, x n ). It is 
defined by analogy with (1), Chapter III, as that function for which 

/»£« f*&\ 

(1) I I f(x 1, x 2 , ■■■, X n ) dx 1 dx 2 • • -dx n 
J <xn J ct\ 

= P[a t < Xi < ft] (i = 1, 2, • • •, n ) 

where the expression on the right denotes the probability that all n 
inequalities will be satisfied and where a* < ft are any two values of 
the variable x % . For two variables, (1) is easily interpreted geometri¬ 
cally. It shows that the probability that a point in the x, y plane will 
be in a given region is equal to the volume under the surface z == /(x, y ) 
which lies above this region. 

For convenience, the letter/will always denote the distribution func¬ 
tion of the indicated variables. Thus, f(x t ) denotes the distribution 
function of the variable x x and may differ considerably for different 
values of i. This notation differs from ordinary functional notation in 

61 



62 


LARGE-SAMPLE THEORY OF ONE VARIABLE 


which f{y) would imply the value of f(x) when x is replaced by y. 
Here f(y) means the distribution function of y and has no connection 
with /(#). No confusion will arise in future work because of this nota¬ 
tion, and much explanation will be saved because of it. 


2. Independent Variables 

Consider the situation when the variables Xi are unrelated in a 
probability sense. For example, suppose that one has two variables 
representing respectively the length and pulse rate of male infants. 
One would not expect to find these two variables related in the sense 
that knowledge of the value of one of them would be of assistance in 
predicting the value of the other. Concerning the four tests mentioned 
in the preceding section, one would expect a relationship to exist be¬ 
tween test scores, unless these tests were on quite different subjects. 
If these tests were designed to measure, say, arithmetical ability, 
manual dexterity, keenness of vision, and musical appreciation, then 
once more one would not be surprised if the scores on these four tests 
were unrelated. To say that variables like these are independent in 
a probability sense implies, for example, that the probability that an 
individual will make a score between 70 and 80 in arithmetical ability 
is independent of what score he made in the other three tests. For 
continuous variables, such a property would require a definition that 
does for intervals what (21), Chapter III, does for discrete variables. 
Such a definition is the following. 


(2) If f(x i, x 2 , , x n ) = f(x x )f{x 2 ) • • -f(x n ), the variables x t are said 

to be independently distributed. 

The desired feature in this definition will be apparent if property (1) 
is applied to (2). Here 


P[*i < Xi < ft] 


1 I f(xi)f(x 2 ) • • * f(x n ) dxi dx 2 "'dx n 

a n J ai 


' f(x i) dx i I f(x 2 ) dx 2 --- I f{x n ) dx n 

ai %)a% J a n 


= P[ct 1 < x x < ft ]P[a 2 < x 2 < ft] - • • P[a n <x n < ft] 


This result states that the probability that the variables will lie in 
any given region is equal to the product of the probabilities of the 
individual variables lying in the intervals that determine the region. 
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This property is the analogue for continuous variables of the definition 
of independent discrete variables given by (21), Chapter III. 

3. Moments 

By extending definitions (7) and (8), Chapter III, for a single varia¬ 
ble, the definitions for the &th moment and the moment-generating 
function of a function, g(x u x 2i • • x n ), of several variables become 

r pbi 

• I g k (xu x 2 , ■■■, x„)/(x 1 , x 2 , • • •, x„) 

Ja\ 

dx i dx 2 • • * dx n 

and 

(4) M e (6) = C ... *->/(x lf x 2 , • •., x n ) dx , dx 2 

*/ a n •/ oi 

4. Sum of Independent Variables 

A very useful formula for later theory arises when the variables x* 
are independent and when g(x i, x 2 , • • •, x n ) is the special linear function 

g(x 1, X 2 , * • *, X n ) = Xi + X 2 H-h X n 

From definitions (2) and (4), it follows that 
M Xl -f • — 

= f ■ f e fl[xi+IS+ ''' ~^ Xn] f(xi)f{x 2 ) ■ ■ ■f(x n ) rfxj dx 2 • • -dx n 

«/fln Joi 

e to “/(x„) rfx n 

But each of the integrals on the right is the moment-generating func¬ 
tion of the indicated variable; consequently 

(5) M XI+ ... +x je) = M Xl (e)M Xi (e) • • -M Xn (e) 

This formula states that the moment-generating function of the sum 
of a set of independent variables is equal to the product of their indi¬ 
vidual moment-generating functions. 

RANDOM SAMPLING 

The idea of random sampling was considered briefly in section 1, 
Chapter III. There it was treated largely from a practical point of 
view. From a theoretical point of view, random sampling should 
possess properties that correspond to some of the useful features of 


Cl /(xi) dx i 


C2 /(x 2 ) dx 2 
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what is considered practical random sampling. For example, suppose 
that tree diameters are being sampled randomly. From a practical 
point of view, random sampling implies among other things that the 
relative frequency of tree diameters found in any given interval should 
approach the relative frequency in that interval for the entire forest. 
Thus, a sampling method that consistently selected too high a per¬ 
centage of average-size trees would not be considered random. This 
does not mean that non-representative samples will not be obtained, but 
rather that the method should be representative in nature. Further¬ 
more, if these successive samples are marked off into sets of, say, 20 
each, then the first measurement of each set will form a set of tree 
diameters that should behave like a random sample. Of course the 
same should hold for the second measurement, and so on, as well. 
Finally, these successive samples should be independent of one another. 
For example, if the first tree selected happened to be unusually large, 
that should have no effect on the size of the second tree selected. A 
consideration of these desirable features for practical random sampling 
leads to the following definition of random sampling for theoretical 
distribution functions. 

Consider a single variable x with distribution function f(x). Let 
X\y x 2 , • • •, x n be a set of n values of x. This set will be called a sample 
of size n drawn from the population represented by /(.r). If repeated 
samples of n each are considered, x % will be a variable which represents 
the ith value of x in each set. Now, the sampling will be said to be 
random if, in such repeated samples of n each, the x % are independently 
distributed and each possesses the population distribution. Thus, 
because of (2) the sampling is random if 

(6) f(x U X 2 , • • •, Xn) = f(Xi)f(x 2 ) • • */On) 

where all the/’s on the right are the same function as f(x). 


DISTRIBUTION OF x FROM A NORMAL DISTRIBUTION 


1. Theory 

Let x be normally distributed with mean m and standard deviation 
cr, and let a random sample of size n be drawn from this normal popula¬ 
tion. Denote the sample mean by 

Xi + x 2 H-fix* 

x =-- 


n 


In repeated samples, each x x will be a variable; consequently in repeated 
samples x will be a variable also. If x is treated as a variable, and 
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property (9), Chapter III, is used, its moment-generating function ma^ 
be written as 

— M[ x - \rXn]/n(fi ) = M Xl ^ -(-Xn 


Since the sampling is random, the variables Xi are independent and 
therefore property (5) may be applied to give 

MM = M XI Q M X2 0. • -M* 0 


But random sampling also implies that all the variables Xi have the 
same distribution function, and hence the same moment-generating 
function. Consequently, all the M’s on the right are the same func¬ 
tion, namely, the moment-generating function of the variable x. Thus, 

(7) MM = M x n 0 


But from (33), Chapter III, it follows that, for x normally distributed, 

M x - m (d) - c ^ 

By property (10), Chapter III, it follows that 

(8) 

Consequently, from (7) 

e t <r 2 /0\ 2 

MM = [e m " + ~ 2 W f= e + "~* 


MM = " + " 2 


Since the expression on the right, when compared with (8), is seen to 
be the moment-generating function of a normal variable with mean m 
and standard deviation v /\/n, and since a moment-generating function 
uniquely defines a distribution function, this result proves the following 
theorem. 


Theorem I. If x is normally distributed with mean m and standard 
deviation cr and random samples of size n are drawn, then the sample 
mean , x, will be normally distributed with mean m and standard deviation 
vfy/n. 

Theorem I shows how the precision of a sample mean for estimating 
the population mean increases as the sample size is increased. Since 
the standard deviation of x measures the variation of sample £'s about 
m and hence may be treated as a measure of the precision of estimating 
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m by means of it is clear from the theorem that it is necessary to take 
four times as large a sample if one desires to double the precision of an 
estimate at hand. 


2. Applications 

As an illustration of the application of the theorem, consider the 
following problem. The manufacturer of string has found from past 
experience that samples of a certain type of string have a mean breaking 
strength of 15.6 pounds and a standard deviation of 2.2 pounds. A 
time-saving change in the manufacturing process of this string is tried. 
A sample of 50 pieces is then taken, for which the mean breaking 
strength turns out to be 14.5 pounds and the standard deviation 2.1 
pounds. On the basis of this sample can it be concluded that the new 
process has had a harmful effect on the strength of the string? Now 
experience indicates that the breaking strength of string is approxi¬ 
mately normally distributed. Hence, it will be assumed that the 
'breaking strength, x, is normally distributed with m = 15.6 and 
<j as 2.2. Now, following the procedure for testing hypotheses outlined 
in the preceding chapter, set up the hypothesis that the new process has 
not affected the breaking strength of the string. Then the sample may 
be treated as a random sample of size 50 from the specified normal 
population. Consequently, by Theorem I, x will be normally distrib¬ 
uted with 



The x normal curve will therefore possess only about Jy the spread 
of the x normal curve. The value of x for this one sample of 50 was 
14.5; hence, the corresponding value of l is 



14.5 - 15.6 
0.31 


—3.55 


From Table II, the probability of obtaining a value of t < —3.55, and 
hence a value of x < 14.5, is only about 0.0002. Since this probability 
is much smaller than the probability of 0.05 being used for judging 
significance, the value 14.5 is highly significant and accordingly the 
hypothesis will be rejected. It appears that the new process produces 
string of a slightly lower mean breaking strength. 

For problems of the type just considered, it is rather common in 
applied statistics to call a/V n the standard error of the mean. The 
name standard error is~a!solised "in connection with statistics other 
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than the mean, being always the same as the standard deviation of 
that statistic. The expres sion probable error is als o fairly common 
in some circles. It is related to the standard error by means of the 
approxima te formula F.E. = 0.6745S.E. For a normal variable x, the 
probability is Y that x will fall in the interval m zb P.E. Since it is 
more convenient to work with standard deviations than with probable 
errors, the use of the probable error is gradually being abandoned. 

As another illustration, consider the following problem. An estimate 
of the new population mean in the preceding problem is desired which 
will be correct to Y pound. How large a sample is necessary to be 
reasonably certain that the resulting mean will not differ from the true 
mean by more than Y 2 pound? Since the sample standard deviation 
turned out to be very close to the previous population standard devi¬ 
ation, it will be assumed that the new population standard deviation 
is also 2.2. Let m be the new r population mean and as before assume 
that x is normally distributed. Then x will be normally distributed 
with mean m and standard deviation 2.2/y/n. The requirement of 
being reasonably certain will be understood to mean being certain with 
a probability of 0.95. Hence, in order to have sample values of x 
fall within Yi pound of m 95% of the time, it is necessary that }/% pound 
correspond to two standard deviations for the x distribution. There¬ 
fore, n must be such that 

1 cr 2.2 4.4 

2 y/n y/n y/n 

The solution of this equation is n = 77. Since a sample of size 50 is 
already available, only about 27 additional observations would be 
needed. 

It is to be noted that the problem just solved specified only the 
magnitude of the deviation of the sample mean from the population 
mean in contrast to the preceding problem in which the question was 
whether the sample mean was too small to be attributed to sampling 
variation. It should also be noted that the population standard devi¬ 
ation was assumed known in both these problems. In most problems 
the population standard deviation is not known. Then the sample 
value of the standard deviation is often used in place of the unknown 
population value; however, this procedure introduces an error. The 
error is not serious for large samples; but for small samples a more 
refined procedure that does not require such approximations is neces¬ 
sary. Such methods will be considered in Chapter VIII. 
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DISTRIBUTION OF x FROM NON-NORMAL DISTRIBUTIONS 


1. Theory 

Since many variables of interest possess distributions that are not 
even approximately normal, it is important to know to what extent the 
theory developed on the basis of assuming normality holds for other 
distributions. Here it will be assumed that x is no longer normally 
distributed but merely possesses a distribution for which the moment¬ 
generating function exists. Then it will be shown that the distribution 
of x approaches a normal distribution as the size of the sample increases. 

Consider the variable t = (x — m)\/n/a and its moment-generating 
function. If properties (9) and (10), Chapter III, and formula (7) 
are applied, 



ms/n8 

e ~M. 


JL-) 


Taking logarithms of both sides to the base e gives 


. x my/nO 

log M t (6) — -b n log M 


Replacing M 
III, yields 


G^) by s 


(jl\ 

\cr\/ nf 


its expanded form as given by (5), Chapter 


log M t (0) = - 


m 


y/nO 


0 

a\/n 


+ n log I 1 + iii + ii 2 ' — 2 —b 

L<J fl 


■} 


If | 6 | is chosen sufficiently small, the logarithm on the right may be 
treated as of the form log [1 + z]; hence 

. , ^ , v my/nd f / 

log M t (9) =---b n “ 


e 


r 


/- + V2 ~ ~2 

V n 2a n 




-K- 


e , e 2 

Ml' — r= + M2 „ 2 • + 


-(■ 


my/n \ / n\ 

-b m -* 

a a 
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d + 


/— l ^ n 2 

ay/n 2a n 

M 2 ' - Ml' 2 e 2 


■) 


+ ■ 


’1 


+ terms in 6 k , k ^ 3 
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Since m = m' and <r 2 = \x 2 — Mi' 2 , * 

e 2 

(9) log M t (6) = —■ + terms in 0*, k ^ 3 

A 

From an inspection of terms in 0*, it will be seen that the only function 

-r+l 

of n which they contain is the factor n 2 . Since k ^ 3, all such 

terms will approach zero as n becomes infinite; consequently 

6 2 

lim log M t {6) = ~ 

n —► aO JL 

which implies that 

t 

lim M t (d) = c 2 

n —*> oo 

As in Theorem I, Chapter III, whose proof resembles this proof, 
these various expansions and limits require a knowledge of advanced 
calculus methods for their justification. Since the last limit is the 
moment-generating function of the standard normal distribution, the 
preceding arguments prove the following theorem. 

Theorem II, If x has a distribution with mean m and standard 
deviation a for which the moment-generating function exists , then the 
variable t — (x — m) y/n/a has a distribution which approaches the 
standard normal distribution as n becomes infinite. 

From a practical point of view, this theorem is exceedingly important 
because it permits the use of normal curve methods on problems related 
to means of the type illustrated in the previous section even when 
the basic variable x has a distribution that differs considerably from 
normality. Of course the more the distribution differs from normality, 
the larger must n become to guarantee approximate normality for x. 
Sampling experiments have shown that for n > 50 the form of f(%) 
has little influence on the form of f(x) for ordinaiy types of f(x). 

From (9) it will be observed that the expansion of M t (0) would con¬ 
tain no term in 0 and that the coefficient of 0 2 /2 would be 1; conse¬ 
quently t is a standard variable. This shows that the formulas 

= m 

( 10 ) 


<T 



hold in general and hence do not depend upon any normality assump¬ 
tion. 
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2. Applications 

The control-chart technique introduced in section 3, Chapter III, 
was designed to check on successive sample percentages to determine 
whether they behaved like random samples from a binomial distribu¬ 
tion. A similar chart may be constructed for sample means. Because 
of Theorem II it is not essential that the basic variable be exactly 
normally distributed for such charts; consequently they are of wider 
applicability. Such a chart is shown in Fig. 1. It should be noted 
that the control band is a 3-standard-deviation band about the mean, 
and therefore for a normal variable the probability would be only 
0.003 of a point falling outside this band. These particular control 



Fig. 1. Control chart for the mean. 

limits are chosen because industrial experience has found them to be 
especially useful. Since many industrial variables are not normally 
distributed, and since the sample means used in control charts are often 
based on only 4 or 5 measurements, one could hardly expect the proba¬ 
bility of 0.003 to be very realistic. It will be observed from Fig. 1 
that the process appears to be under control. 

DISTRIBUTION OF THE DIFFERENCE OF TWO MEANS 
1. Theory 

A frequent problem in science is to determine whether real differences 
exist between two sets of similar data. One method of treating the 
problem statistically is to determine whether it is highly probable that 
real differences exist between the means of the populations from which 
the data were assumed taken. 

Let x and y be the sample means of two sets of data based on random 
samples of sizes n x and n y respectively. These two means may be 
treated as the first pair of samples in repeated sampling; consequently 
x and y may be treated as variables for which one pair of values is 
available. Since the samples are random, x and y will be independently 
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distributed. If x and y are normally distributed, or if n x and n y are 
sufficiently large, x and y will be normally distributed, or approximately 
so. It will be assumed therefore that x and y are normally distributed. 
Now consider the moment-generating function of the variable x — y. 
If property (5), property (9), Chapter III, formula (8), and formulas 
(10) are applied, 

- M*WM-y{0) 

= M 2 (O)M- y (-0) 

= e z -e 1 

— c \ n x n y J £ 


Since this expression is the moment-generating function of a normal 
variable, this result proves the following theorem. 


w / Theorem III. If x and y are normally and independently distributed , 
then x — y is normally distributed with mean m ± ^y = m x — m y and 


standard deviation 


-y = V— +—'’ 

y n x n y 


2. Applications 

Consider the following problem. A potential buyer of light bulbs 
bought 50 bulbs of each of two brands. Upon testing these bulbs, he 
found that brand A had a mean life of 1,282 hours with a standard 
deviation of 80 hours, whereas brand B had a mean life of 1,208 hours 
with a standard deviation of 94 hours. Do the two brands differ in 
quality? To answer this question, set up the hypothesis that the two 
samples came from normal populations with the same means. The 
samples are evidently independent; therefore by Theorem III, x — y 
is normally distributed with 

m^~y = 0 and a x -y == 

Since <r x 2 and a y 2 are unknown, it is necessary to estimate them from 
their sample values. Such approximations introduce an error, but for 
samples as large as 50 this error is not serious. It can be shown that 
the error in o- 2 _ 5 would probably not exceed 10% here. With these 
approximations, 

, /(S0p (94^ 

m x -y = 0 and <r^y =\/-1-= 17.5 

v ' 50 50 
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If, as before, a significance level of 0.05 is chosen, then a value of z — y 
exceeding 35 will be judged significant. Since x — y = 74 here, this 
difference is highly significant, and therefore the hypothesis of equal 
cleans is rejected. It seems quite certain that the two brands differ 
in quality as far as mean burning time is concerned. Although it was 
assumed that x and y were normally distributed, it would have sufficed 
to assume that x and y were normally distributed. Since the values 
of n z and n y are sufficiently large to make the latter assumption highly 
reasonable, the above significant difference cannot very well be at¬ 
tributed to a possible lack of normality for burning time. 

After a test has indicated significant differences, it is usually of 
interest to determine how large a difference in population means may 
be reasonably assumed. This problem will be considered in Chapter 
VIII. 

DISTRIBUTION OF THE DIFFERENCE OF TWO PERCENTAGES 
1. Theory 

If two sets of data drawn from binomial distributions are to be 
compared, it is necessary to work with percentages of successes rather 
than with the number of successes, unless the number of trials in each 
set is the same. For example, 40 heads in 100 tosses of a coin would 
not be compared with 30 heads in 50 tosses unless they were both placed 
on a percentage basis. Now from Theorem I, Chapter III, and (34), 
Chapter III, it follows that the percentage of successes, p' = x/n, may 
be assumed to be normally distributed with mean p and standard devi¬ 
ation a/ pq/n provided that n is large. 

Let pi and p 2 ' be two independent sample percentages based on n v 
and n 2 trials, respectively, from binomial distributions with proba¬ 
bilities pi and p 2 , respectively. If pi and p 2 ' are treated as normal 
independent variables and if one proceeds as for x — y, 

= M Px '(e)M„ P2 ’{e) 

(> *ni .g 2 nz 

(pi _ P2)9+ (mi + ^V! 

as £ \ n i 712/2 

This demonstrates the following theorem. 

Theorem IV. When the number of trials , n\ and n 2 , are sufficiently 
large , the difference of the sample percentages , pi — p 2 ', will be approxi- 
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m ately normal ly distributed with m px ^ p2 > = p\ — p 2 and <r pi '- P2 ' =* 

IpiQl | P2Q 2 
* tti n 2 

As for the simple binomial distribution, the normal approximation will 
usually be satisfactory in applications if the n x p l exceed 5. 

2. Applications 

As an illustration of Theorem IV, consider the following problem. 
A railroad company installed two sets of 50 red oak ties each. The 



two sets were treated with creosote by two different processes. After 
a period of twenty years of service, it was found that 22 ties of the 
first set and 18 ties of the second set were still in good condition. Is 
one justified in claiming that there is no real difference between the 
preserving properties of the two processes? To answer this question, 
set up the hypothesis that the probability, p, of a tie surviving this 
period of service is the same for both processes. Then, from Theorem 
IV, 


m pi'-P 2 ' ” 0 and ° 


Pl “P2 \ 


m m 

50 + 50 


Vpq 

5 


The value of p is unknown, and so its value must be estimated from 
sample values. Since the hypothesis treats the two samples as though 
they were drawn from populations with the same p, the samples may 
be combined into one sample of 100 for which there were 40 successes. 
Hence a good estimate of p here is 0.4. With this estimate, 

m P!'-P2' = 0 and °V-P2' ~ 0.10 

The situation here is described geometrically in Fig. 2. Since pi 
— p 2 r = 0.44 — 0.36 = 0.08 lies well within a two-standard-deviation 
interval of the mean, there is no reason for doubting the truth of the 
hypothesis at the 0.05 significance level. The fact that the value of 
p must be estimated from sample values and that pi — p 2 r is only 
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approximately normally distributed makes this test somewhat inaccu¬ 
rate unless both samples are large. Both samples are sufficiently large 
here to insure a fairly reliable test. 

As a second illustration, consider this problem. A civil-service 
examination is given to a group of 200 candidates. On the basis of 
their total scores, the 200 candidates are divided into two groups, the 
upper 30% and the remaining 70%. Consider question one on this 
examination. Among the first group, 40 had the correct answer, 
whereas, among the second group, 80 had the correct answer. Is this 
first question any good for discriminating ability of the type examined 
here? To solve this problem, set up the hypothesis that the question 
does not discriminate between the two groups. Then 

Pi ~ P2 = U - ITT) = 0.10 


m p i'_ P2 ' = 0 


C Vl— P2 



m 

140 


where p is the probability that an individual selected at random from 
the population from which this group of 200 is thought of as having 
been sampled will get the correct answer to this question. As an esti¬ 
mate of p , combine the two groups to give 120 successes in 200 trials, 
or a value of 0.6. With this approximation for p, 

°V-P2' * 0.076 

Since the observed difference, p\ — p 2 ' = 0.10, is less than the critical 
difference of 0.152, this result is not significant, and therefore there is 
no reason to doubt the truth of the hypothesis. 
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The formula — <r/\/n is but one of many standard-error formulas that can 
be derived for solving certain large-sample problems. For example, problem 23 
in Chapter VIII is concerned with proving that <r 8 2 = v2(n — l)/nV provided 
that x is normally distributed. Since it can be shown that the sample variance of 
a normal variable possesses an approximate normal distribution for large samples, 
this standard-error formula could be used to solve certain large-sample variance 
problems similar to the mean problems of this chapter. However, more precise 
methods are usually available for such problems. Chapter VIII is concerned with 
such preferred methods. The technique of deriving standard error formulas is 
carefully explained and illustrated in: 

Yule and Kendall, op. cit. y pp. 380-411. 


EXERCISES 

1. Suggest how to sample randomly from (a) trees in a forest, (6) string being 
manufactured, (c) a carload of wheat, (d) households to obtain information about 
sizes of families, (c) the public concerning political views. 

2. Past experience indicates that wire rods purchased from a company have a 
mean breaking strength of 400 pounds and a standard deviation of 15 pounds, 
(a) If oiKi rod is selected at random from a lot, between what two values would you 
reasonably expect its breaking strength to he? What assumptions are you mak¬ 
ing here? ( b ) If 25 rods are selected, between w hat tw T o values w^ould you reason¬ 
ably expect their mean to he? Are the same assumptions necessary here as in (a)? 
(c) How many rods w r ould you select so that you would be certain with a probability 
of 0.95 that your resulting mean u'ould not be 111 error by more than 2 pounds? 

3. Use Tippett’s numbers to draw samples of 10 from the discrete population 


X 

0 

1 

2 

3 i 

4 

V 

0.30 

0.25 

0.20 

0.15 

0.10 


Draw 20 (or more) sets of 10 each, and calculate x for each set. Graph the histo¬ 
gram for these 20 (or more) x’s, and note the approach to normality. Calculate 
the mean and standard deviation of these 20 x’s, and compare wath the values to 
be expected from theory. 

4. The same type of test w T as given to two classes. The first class of 20 stu¬ 
dents averaged 123 points with a standard deviation of 32 points; the second class 
of 32 averaged 138 points with a standard deviation of 24 points. Was the second 
class superior? 
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6 . The following data give the number of years after marriage in which divorce 
occurs, if it occurs. Is one justified in claiming that divorce occurs earlier than 
it used to? 



1887-1906 

1929 

Number of 
divorces 

22,500 

2,650 

Mean time in 
years 

10.37 

9 83 

Standard 

deviation 

8.39 

8.26 


6. Two different samplers, A and B, were sent into the same forest to select 
trees at random. The diameters of trees were measured with the following results 



A 

B 

Number meas¬ 
ured 

100 

100 

Mean diameter 
in inches 

19.2 

20.3 

Standard devi¬ 
ation 

3.2 

2.6 


(a) Does the smaller standard deviation for B imply that he is more accurate 
than A? (6) What conclusions can be drawn concerning the relative accuracy of 
A and B? (c) If you knew that the true mean was 19.7, could you draw any 
further conclusions? 

7. In a poll taken among college students, 46 out of 200 fraternity men favored 
a certain proposition while 51 out of 300 non-fraternity men favored it. Was 
there a real difference of opinion on this proposition? 

8 . A manufacturer of house dresses sent out advertising by mail. He sent 
samples of material to each of two groups of 1,000 women but for one group he 
enclosed a white return envelope and for the other group he enclosed a blue en¬ 
velope; 10% and 13% respectively responded. Would the blue envelope help 
sales? 

9. A civil-service examination was given to 200 people. On the basis of their 
total scores they were divided into the upper 30%, the middle 40%, and the lower 
30%. On a certain question, 39 of the upper group got the correct answer while 
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29 of the lower group got the correct answer. Is this question likely to be useful 
for discriminating ability of this type? 

10. (a) Construct a control chart for x for the following data on the blowing time 
of fuses, samples of 5 being taken every hour, (b) If these are the first data taken 
on this product, would you say that the process seemed to be under control and 
hence that the mean and standard deviation from these data could be used for 
future control? (c) If it is known that previous control existed with a mean and 
standard deviation about equal to these sample values, would these data justify 
some action on the part of the engineer in charge at any time? Each set of 5 has 
been arranged in order of magnitude. 


42 

42 

19 

36 

42 

51 

60 

18 

15 

69 

64 

61 

65 

45 

24 

54 

51 

74 

60 

20 

30 

109 

91 

78 

75 

68 

80 

69 

57 

75 

72 

27 

39 

113 

93 

94 

78 

72 

81 

77 

59 

78 

95 

42 

62 

118 

109 

109 

87 

90 

81 

84 

78 

132 

138 

60 

84 

153 

112 

136 


11. What would you expect the control chart of a given operator to look like 
during an ordinary day's work of 8 hours if he turns out about the same number of 
parts each hour and samples are taken every hour? 

12. With a 10-cm. line some distance away, draw 25 freehand lines, attempting 
to make them 10 cm. long. Cover all lines drawn to avoid being influenced by 
them. Assemble data for the class, and construct a control chart for x based on 
sets of 5. Each student supplies five x’a. 

13. If f(x) = e~ x , x ^ 0, find the moment-generating function of x. 

14. If f(x) = ce~ x x k , x ^ 0, k > 0, find the moment-generating function of £. 
Compare with that of x, and draw conclusions. 
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FREQUENCY DISTRIBUTIONS OF TWO VARIABLES 
LINEAR REGRESSION 

Thus far, data of a single variable only have been studied. A large 
percentage of the problems in statistical work, however, involve two 
or more variables. In some problems the variables are studied simul¬ 
taneously to see how they are interrelated; in others there is one 
particular variable of interest and the remaining variables are studied 
for their possible aid in throwing light on this particular one. In such 
problems the investigator is often interested in using any relationships 
that he finds for making estimates or predictions of the basic variable 
in situations similar to the one at hand. With two variables, the basic 
variable will be denoted by y , and the related variable by means of 
which it is hoped to obtain information concerning y will be denoted 
by x. Then the problem is to determine the relationship between y 
and x in some form convenient for estimating y from x. 

The investigation of the relationship between two variables usually 
begins with an attempt to discover the approximate form of the rela¬ 
tionship by graphing the data as points in the plane. Such a graph is 
called a scatter diagram. By means of it one can quickly discern 
whether there is any pronounced relationship and, if so, whether the 
relationship may be treated as approximately linear, that is, whether the 
points tend to follow a straight line. 

Consider the situation when the variables appear to be linearly 
related. In particular, consider the first two columns of Table 1, 
which give the vocabulary test and I.Q. scores of 20 fifth-grade students, 
and Fig. 1, which is the scatter diagram for these data. There appears 
to be a rather strong tendency here for I.Q.’s to increase with increasing 
vocabulary scores. Moreover, the trend appears to be approximately 
linear. Since it is of interest to estimate a student’s I.Q. by means of 
his vocabulary score, the variables have been selected as indicated. 
Hence, the problem is to determine the best-fitting straight line for 
such estimating purposes. 

There are numerous methods for fitting curves to a set of points, 
but the most generally satisfactory method is the following one, known 

78 
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as the method of least squares . Since the desired curve is to be used for 
estimating purposes, it is reasonable to require that the curve be such 
that it makes the errors of estimation small. If the value of the variable 
to be estimated is denoted by y and the corresponding curve value 
by y' } then the error of estimate is given by y — y\ Since the errors 
may be positive or negative and might add up to a small value for a 
poorly fitted curve, it would not do to require merely that the sum 
of the errors should be as small as possible. This difficulty could be 



Fig. 1. Scatter diagram for I.Q. and vocabulary scores. 


avoided by requiring that the sum of the absolute values of the errors 
should be as small as possible. However, sums of absolute values are 
not convenient to work with mathematically; consequently the diffi¬ 
culty is avoided by requiring that the sum of the squares of the errors 
should be a minimum. 

Consider the application of this principle to the fitting of a straight 
line to a set of n points. Now the equation of any non-vertical line 
may be written in the form 

(1) y' = a + mix — x) 

where m is its slope and a — mx is its y intercept. Then the problem 
is to determine the parameters a and m so that the sum of the squares 
of the errors of estimation will be a minimum. If the coordinates of 
the ith point are denoted by (x t , y t ) } this sum of squares will be 

_ n 

— Vt) 2 > When yf is replaced by its value as given by (1), it 
i 

becomes clear that this sum is a function of a and m only. If this 
function is denoted by G(a, m), 
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n 

G(a, m) = Yj - a ~ m ( x < ~ 5 )] 2 
1 

If this function is to have a minimum value, it is necessary that its 
partial derivatives vanish there; hence 

dG 

— =22 [y — a — m(x — £)][ — 1] = 0 
da 

dG 

— =22 [y — a — m{x — 5;)][ — (x — x)] = 0 
dm 


where the subscripts and range of summation have been omitted for 
convenience. When the summations are performed term by term and 
the sums that involve y are transposed, these equations assume the 
form 

an + mS(.r — x) = 2 y 

( 2 ) 

aZ(x — x) + w2 (x — x)~ = 2(x — x)y 
Since 2 (a: — x) = 0, the solution of these equations is given by 


a = y 


and 


m — 


S(x - x)y 
2(x — x) 2 


As a result, the least-squares line may be written 


(3) y' - y = m(x - x) 

where 

2(x — x)?y 

m — -- 

2(x - £) 2 

This line is often called the regression line of y on x. It should be noted 
that this line passes through the mean point (x, y ). 

For computational purposes, it is convenient to change the form of m 
slightly in the following manner: 


m = 


Ixy 


xZy 


Zx 2 - 2xZx + Zx 2 
2 xy — nxy 


Zx 2 


nx 2 


Table 1 illustrates the computational procedure for the data men¬ 
tioned previously. If a calculating machine is available, only the sums 
of the last two columns should be recorded for those columns. As a 
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result of these computations, the equation of the regression line was 
found to be 

y f - 116.4 = 1.1320 - 46.5) 

The graph of this line is shown on the scatter diagram of Fig. 1. 


TABLE 1 


X 

y 

xy 

x 2 

30 

94 

2,820 

900 

33 

96 

3,168 

1,089 

29 

103 

2,987 

841 

44 

103 

4,532 

1,936 

35 

105 

3,675 

1,225 

44 

105 

4,620 

1,936 

51 

112 

5,712 

2,601 

48 

113 

5,424 

2,304 

56 

113 

6,328 

3,136 

43 

114 

4,902 

1,849 

47 

114 

5,358 

2,209 

41 

115 

4,715 

1,681 

42 

118 

4,956 

1,764 

61 

124 

7,564 

3,721 

51 

126 

6,426 

2,601 

46 

126 

5,796 

2,116 

56 

134 

7,504 

3,136 

55 

135 

7,425 

3,025 

54 

139 

7,506 

2,916 

64 

140 

8,960 

4,096 

930 

2,329 

110,378 

45,082 


LINEAR CORRELATION 
1. Correlation Coefficient 

After a regression line has been determined, it is of interest to know 
how useful the line is for estimating purposes. One might believe 
that a measure of a line’s usefulness would be given by the standard 
deviation of the errors of estimation, with a small value corresponding 
to accurate estimates and hence to a useful line. However, consider 
the situations illustrated in Figs. 2 and 3. The standard deviation 
of the errors of estimation is about the same in these two illustrations, 
and yet the second line is of no value in helping one to estimate y 
from a knowledge of x . 

If no attempt is made to fit a regression line to a set of data, the best 
estimate in the sense of least squares that one could make for y corre- 
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sponding to any x would be y. This is seen by minimizing 2 (y — a) 2 
as a function of a. Thus, 

-7-2(2/ ~ a ) 2 — 22(y — o)(—1) =0 
da 

henc<3, solving for a gives 

2t/ 

a = — = y 
n 

Now, as a measure of the usefulness of a line for estimating purposes, 
it seems natural to consider the ratio of the sum of the squares of the 



Figs. 2 and 3. Scatter diagrams for different degrees of correlation. 


errors of estimation based on the regression line and the sum of the 
squares of the errors when no attempt is made to fit a regression line. 
In view of the preceding paragraph, such a measure would be w ritten 


S(y - y') 2 
2 ( 2 / - y ) 2 


The numerator of this ratio is 2c 2 , whereas the denominator is ns y 2 . 
From (3) it follows that 

_ Sc 2(7/ - 7 /) 


“-2[(y 

n 


y) — m(x — x)] = 0 


Consequently, 


2c 2 = 2(c — c) 2 = ns e 2 


As a result, the above ratio may be written in the form 

s(y - v' f = 

{ } 2 (y - y) 2 s 2 

For the situation illustrated in Fig. 3, this ratio would be approximately 
1, while for the situation of Fig. 2, it would be a rather small fraction 
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Since it is conventional to have zero correspond to a useless line and 
to have 1 correspond to a line that estimates perfectly, a preferable 
form for this measure of usefulness is obtained by subtracting this 
ratio from 1. Then 


(5) 


r 2 = 1 


4 2 


defines the desired measure. The quantity r is called the correlation 
coefficient . Thus, the correlation coefficient is a statistic that measures 
the usefulness of a regression line for estimating purposes. It will 
have a value close to zero for a line incapable of prediction and close 
to ±1 for a line capable of nearly perfect prediction. The positive 
square root is taken for r if the regression line slopes upward, and the 
negative root if it slopes downward. This convention merely tells one 
whether y increases or decreases as x increases. It is the magnitude 
of r that is important. 

Interpretation. The correlation coefficient is usually spoken of as a 
measure of the strength of the relationship between two variables. 
However, it measures the strength of the relationship only when that 
relationship is linear. This statement is qualitative, just as is the 
above statement about r measuring the usefulness of the regression 
line for estimating purposes. In other words, no scale of measurement 
is provided to enable one to compare two sets of data for their relative 
strength of relationship. Thus, a correlation coefficient of 0.0 does not 
imply twice as strong a relationship as a correlation coefficient of 0.3, 
nor does it imply that the first regression line is four times as useful 
as the second line. However, a quantitative way of looking at a 
correlation coefficient can be obtained in the following manner. 

Since the square of the standard deviation arises so often, it has 
been given a name and is called the variance. Consider, then, the 
following method of dividing the variance of y into two parts. 


- 2(y - y ) 2 = - 2(y - y' + y' - y) 2 
n n 


= -{2(|/ - y f ? + 22(y - 7,0(2/' - y) + 2(y' 
n 




The value of y r is given by (3). After both sides of (3) are summed it 
will be observed that the mean of the estimated values, y' f is equal to 
y . With these two substitutions, 
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«„ 2 '= - {S(y - y') 2 + 22[y - y - m{x - x)]m(x - x) + 2(/ - i/') 2 } 
n 

2(y-y') 2 , 2(x-x)(y-y) 2(a; - x) 2 S (y 1 - y') 2 

n n n n 


From the definition of m in (3), it is easily seen that 

2(a; — x)(y ~ y) = 2(x — £)?/ = m2(x — £) 2 

and hence that the second and third terms on the right cancel each 
other. From previous work leading to (4) it was shown that the first 
sum on the right is s 2 ; therefore 

Sy 2 = S c 2 + Sy 2 


This result shows that the variance of y can be written as the sum of 
the variances of the errors and of the estimated values. Dividing both 
sides by s y 2 yields 


( 6 ) 



1 


If the first fraction is transposed to the right side, it will be clear from 
(5) that 


This formula states that r 2 is equal to the percentage of s y 2 that is con¬ 
tributed by the estimated values. Thus, a correlation coefficient may 
be interpreted quantitatively by stating that the square of a correla¬ 
tion coefficient is equal to the percentage of the variance of y that 
has been accounted for by the relationship with x. For example, if the 
correlation coefficient between the height and diameter of trees of a 
given species were 0.80, then 04% of the variation in tree height could 
be explained by the relationship of height to diameter. The remaining 
36% of s v 2 would be due to other factors. It is customary to speak of 
this remaining part of s y 2 as the error variance since it is the variance 
of the errors of estimation. It should be noted that because of (6) 
the interpretation in terms of percentages must be confined to variances. 
If the interpretation were attempted for r and standard deviations, the 
percentages would not total 100%. This interpretation in terms of r 2 
rather than of r has the tendency to curb unwarranted belief in the 
strength of relationship between two variables which would arise if r 
were treated as the quantitative measure of the relationship, because 
r 2 is considerably smaller than r, except for r close to 1. 
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Computation for unclassified data. The definition of r given by (5) 
is not convenient for computing r because it requires the computation 
of the errors of estimation. A form convenient for computation may 
be obtained in the folloiving manner. 

Since y' = (7) may be written 

= 2Q/ - go 2 = 20/ - y) 2 

nsy 2 nSy 2 


Expressing this in terms of m from (3) gives 


( 8 ) 


r 2 = 


m 


2 2(x — x) 2 


ns v 



Inserting the value of m and taking the positive square root results in 


_ s* 20 — x)y _ s x 20 - x)(y — y) 

7 y X ({ 3 T -~£) 2 ~ ~ s v 20 - x ) 2 

Hence, 

20 - x)(y - y) 

(9) r = - 

ns x s y 

The positive root was taken here so that the sign of r would agree 
with the sign convention made following (5). The expression (9) is 
often given as the definition of r. The chief objection to (9) as the 
definition of r is that it is quite artificial and does not explain the 
dependence of correlation on linear regression. 

If the value of m given by (8) is inserted in (3), the equation of the 
line of regression can be written in the commonly used form 


(10) y' - y = r — (x - x) 

Form (9) is still not the most convenient for computational purposes. 
A better form for such purposes is obtained by multiplying out factors 
and inserting values for s x and s v as follows: 

2 xy-nxy 

(11) r = - ,-:- ~ -■■ -~- — 

v [2x 2 — nx 2 ][Ly 2 — ny 2 ] 

nZxy — 2x2t/ 

~ V[n2x 2 - (Sx) 2 ][n 22 / 2 - (2#j 

This form requires the sums of x, y } x 2 , y 2 } and xy. If x and y are posi¬ 
tive and are not more than two- or possibly three-digit numbers each, 
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all these sums can be computed in a single set of operations with a ten- 
bank calculating machine that has a split and locking upper dial by 
punching x and y on opposite sides of the keyboard. If the machine 
does not possess the split upper dial, the sums of x and y ) say, must 
be computed in a second set of operations. If x and y contain more 
digits than those specified above, these sums should be computed by 
whatever means seems efficient. For the data of Table 1, which are 
plotted in Fig. 1, the only additional computations needed here are 
those for S?/ 2 . Computations give r = 0.81; hence about 66% of the 
variance in I.Q. scores can be attributed to the relationship with 
vocabulary scores. It appears that I.Q.’s can be estimated fairly well 
by means of vocabulary scores. 

Computation for classified data. Data so numerous that the preced¬ 
ing computations would become unduly lengthy are conveniently 
classified with respect to both variables. When data have been 
classified, the short method of computation used for finding moments 
may be employed with advantage for computing r. Let 

X{ = c x u x + x 0 

and 

yi — c y v % + y 0 

where c z and c y are class intervals, and u and v are the new integral 
variables. Then 

x t — x = c x (u x — u) 

and 

Vi — y — - v) 

Substituting in (9) and simplifying, it will be found that 

Z(u — u)(v — v) 
r =- 

TbSuS 

where it is understood that the summation extends over all values, 
whether distinct or not. If only distinct values of ( u — u) (v — v ) were 
implied, it would be necessary to multiply by the frequency for each 
such value. This shows that the correlation coefficient of the variables 
u and v is equal to that for x and y. The technique of computing r 
by means of the new variables is illustrated in Table 2. These classified 
data represent the relationship between the percentage of trend values 
for high-grade bond yields, x , and stock sales, y> at the New York 
Stock Exchange. 
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TABLE 2 


\ * 

V \ 

94.5 

96.5 

98.5 

100.5 

102.6 

104.5 

106.5 

108.5 

110.5 

fu 

« 

Vfy 


uv 

29.5 



4 

3 


4 

1 


1 

13 

-3 

-39 

117 

-36 

59.5 

1 

3 

6 

18 

6 

9 

2 

3 

1 

49 

-2 

-98 

196 

-64 

89.5 

7 

3 

16 

16 

4 

4 

1 


1 

52 

-1 

-52 

52 

23 

119.5 

5 

9 

10 

9 

2 


1 

2 


38 

0 




149.5 

3 

5 

8 

1 


1 




18 

1 

18 

18 

-25 

179 5 

4 

2 

3 

1 






10 

2 

20 

40 

-38 

209.5 

4 

4 


1 






9 

3 

27 

81 

-60 

239.5 

1 

1 




I 




2 

4 

8 

32 

-20 

fx 

25 

27 

47 

49 

12 

18 

5 

5 

3 

191 

' 

-116 

536 

-220 

u 

- 3 

-2 

-1 

0 

1 

2 

3 

4 

6 






u/x 

-76 

-64 

-47 


12 

36 

15 

20 

15 

-78 





U 2 fx 

225 

108 

47 


12 

72 

45 

80 

75 

664 





vu 

-54 

-32 

26 


-16 

-66 

-24 

-24 

-30 

-220 






If x and y are replaced in formula (11) by u and v, 

(191)(-220) - ( — 116)( — 78) 

r = —7 ======= : --- .=== = = ===== ======== = —0.49 

v [(191)(536) - ( —116) 2 ][(191)(664) - (—78) 2 ] 

The only new feature of these computations is the method of computing 
the products of u and v in the last row and column. This sum is com¬ 
puted two ways to give a check. In computing the entries in the uv 
column, for example, it is convenient to start with the first row and 
compute the uv terms for it first. Since all uv terms in this row have 
the same value of v } namely —3, it is merely necessary to compute the 
sum of the u values in this row and then multiply by the common v. 
Thus, the third cell contains a frequency of 4 corresponding to u — —1; 
hence —4 is mentally recorded. Next, the frequency of 3 corresponds 
to u = 0; hence it contributes nothing to the sum. Next, the frequency 
of 4 corresponds to u = 2; hence 8 is mentally added to the previous 
sum of —4 to give 4. Then the frequency of 1 corresponding to 
u = 3 brings the sum to 7, and finally the frequency of 1 corresponding 
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to u = 5 brings the total to 12. This value is then multiplied by the 
common v value of —3 to give —36, which is then recorded. This 
procedure is followed for each row and column, all computations being 
performed mentally when the frequencies are small. 

Cause and effect. The interpretation of a correlation coefficient, 
either qualitatively as a measure of the strength of relationship of two 
variables or quantitatively in terms of r 2 as giving the percentage of the 
variance of y as accounted for by the relationship, is a purely mathe¬ 
matical interpretation and is completely devoid of any cause or effect 
implications. The fact that two variables tend to increase or decrease 
together does not imply that one has any direct or indirect effect on the 
other. Both of them may be influenced by other variables in such a 
manner as to give rise to a strong mathematical relationship. For 
example, over a period of years the correlation coefficient between 
teachers’ salaries and the consumption of liquor turned out to be 0.90. 
During this period of time there was a steady rise in wages and salaries 
of all types and a general upward trend of good times. Under such 
conditions, teachers’ salaries would also increase. Moreover, the 
general upward trend in wages and buying power would be reflected 
in increased purchases of liquor. Thus, this high correlation merely 
reflects the common effect of the upward trend on the two variables. 
Correlation coefficients must be handled with care if they are to give 
sensible information concerning relationships between pairs of variables. 
Success with correlation coefficients requires familiarity with the field 
of application as well as with their mathematical properties. 

Reliability. In any given problem involving linear correlation, the 
value of r may be thought of as the first sample value of a sequence of 
possible sample values that would be obtained if repeated sets of similar 
data were obtained. Such sets of data are then thought of as samples 
of size n diawn at random from some population. This population in 
turn is thought of as being represented by a distribution function of 
two variables, x and y , which contains a parameter, p, to serve as a 
measure of the relationship between x and y. Then the value of r 
may be used to estimate the population parameter p, just as a sample 
mean, x, is used to estimate the population mean, m. 

If the distribution function of x and y is assumed to be of a certain 
common type that will be studied in the next chapter, such repeated 
sample values of r will follow a known distribution function. The form 
and derivation of this distribution function are too complicated to be 
considered in this book. Fortunately, there exists a simple change of 
variable which transforms this complicated distribution function into 
an approximately normal distribution. The normal approximation 
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may then be used to determine the precision of r as an estimate of p 
in much the same way that the normal distribution of x was used to 
determine the precision of x as an estimate of m. This change of 
variable, 

1 1 + r 

(12) z = -\og e j— 


is such that z will be approximately normally distributed with mean 


and standard deviation 


m z 


i'° 1 2 * * * & 


1 +P 

1 - p 


(j z = 


1 


V n - 3 


As an illustration, consider the following problem. Is a correlation 
of r = 0.20 between the face index and the cephalic index of 50 mem¬ 
bers of a certain race significant? Set up the hypothesis that p = 0. 
Then the variable z will be approximately normally distributed with 
m z = 0 and a z = l/\/T7 = 0.15. If a significance level of 0.05 is 
taken and if the two tails of this normal distribution are used as a 
critical region, a sample value of r will be significant if it has a value 
of z such that | z | > 0.30. Here, 

1 1.2 

z = - log e — = 0.20 

2 0.8 

Since this value does not exceed the critical value, the value of r = 0.20 
is not significant. A value as large as this would be obtained fairly 
often in random samples from a population in which the two variables 
were uncorrelated. 


CURVILINEAR REGRESSION 

1. Polynomial Regression 

If a scatter diagram indicates that a straight line will not fit a set 
of points satisfactorily because of the non-linearity of the relationship, 
it may be possible to find some simple curve that will yield a satis¬ 
factory fit. Unless there are theoretical reasons for expecting a curve 
of a certain type to represent the relationship, polynomials are usually 
selected because of their simplicity and flexibility. The proper degree 
to use can often be determined by an inspection of the scatter diagram. 
After the degree has been determined, the best-fitting polynomial of 
that degree may then be fitted by the method of least squares. 
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It will suffice to derive the least-squares equations for a polynomial 
of the third degree because the methods are the same for higher degrees. 
Let 

(13) y' = a + bx + cx 2 + dx 3 

represent such a polynomial. Then the sum of the squares of the errors 

n 

of estimation, [t/* — y t f ] 2 j will be a function of the four parameters 
1 

a, b } c y and d only. If this function is denoted by G(a , b, c, d ), then 

(14) G(a, by Cy d) = 2 [y — a — bx — cx 2 — dx z ] 2 

In order that this function shall have a minimum value, it is necessary 
that 

dG dG dG dG _ 

da db dc dd 

there. Differentiation of (14) produces the equations 

22 [y — a — bx — cx 2 — dx*}[ — 1] = 0 

22 [y — a — bx — cx 2 — <Zx 3 ][ —x] — 0 

22 [y — a — bx — cx 2 — dx 6 ][—x 2 ] = 0 

22 [y — a — bx — cx 2 — dx*][—x z J = 0 

If the quantities in brackets are multiplied out and the individual 
terms summed, these equations will reduce to the form 

an + 62x + c 2 x 2 + c? 2 x 3 = 2 y 

aZx + 62x 2 + c2x 3 + dhx 4 = 2 xy 

aLx 2 + K£x 3 + c 2 x 4 + d 2 x 5 = 2 x 2 y 

a 2 x 3 + 62x 4 + c 2 x 5 + d 2 x 6 = 2 x 3 y 

These equations are called the normal equations of least squares. 
Their solution when substituted in (13) yields the desired polynomial 
of best fit. 

2. Functional Linear Regression 

There are numerous non-linear relationships in science in which the 
explicit form of the relationship is given from theoretical considera¬ 
tions. In such situations the fundamental problem is to find estimates 
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of the parameters that are needed to determine the equation of the 
curve representing the relationship. For example, the equation 

pv y — constant 

represents the relation between the pressure and volume of an ideal 
gas undergoing adiabatic change. Here 7 is a parameter whose value 
depends upon the particular gas and for which an estimate may be 
obtained experimentally. 

The problem of fitting non-polynomial curves to a set of points is 
not nearly as simple as that of fitting polynomials. It will be dis¬ 
covered that the technique of least squares in such problems often 
gives rise to normal equations that can be solved only by tedious 
numerical methods. Sometimes it is possible to reduce the equation 
of the curve to a simpler form by considering functions of the variables 
as new variables. As an illustration, the equation y — ac hx , which 
arises for example in the study of simple growth phenomena, can be 
reduced to linear form by taking logarithms of both sides. Then 

loge y = loge a + bx 

This equation may be written in the form 

Y = A + bx 

In this form it would be possible to fit a straight line to the set of points 
plotted in the x, log c y plane, or to the set of points z, y plotted on seini- 
log paper. This least-squares line, of course, is not equivalent to the 
least-squares exponential curve fitted to the original set of points; how¬ 
ever, it usually differs very little from it. 

This same technique could be applied to a curve of the type y = ax b T 
which arises, for example, in certain engineering problems. 

Techniques of this type fail on a curve of the form y = c + ae bx , 
which arises in the study of more complex growth phenomena and 
certain chemical reactions. In such situations it is often possible to 
determine satisfactory values of the parameters by passing the curve 
through a sufficient number of carefully selected points of the set to 
obtain as many equations to be satisfied as there are parameters. 

The various methods that have been discussed briefly in this section 
for determining curvilinear regression equations are justified by con¬ 
venience rather than by strong theoretical considerations. 
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CURVILINEAR CORRELATION 

Since r is defined in terms of the least-squares line, it is clear that r 
will serve as a satisfactory measure of relationship only when the general 
trend of the scatter diagram is approximately linear. If the relationship 
between x and y is strong but curvilinear, it may happen that r will 
be small numerically and thus give a false impression of the true rela¬ 
tionship. For example, if x and y possessed a semicircular type of 

relationship as indicated in Fig. 
4, the least-squares line would 
be approximately horizontal and 
hence r would be very small, in 
spite of the strong relationship. 
It is therefore necessary to define 
a measure of relationship that will 
hold for non-linear relationships. 
The obvious approach is to use the 
definition of r given by (5) with 
the understanding that s e 2 is to be 
the variance of the errors of esti¬ 
mation based on the least-squares polynomial rather than on the least- 
squares line. This generalized version of r is called the correlation 
index. It is interpreted in much the same manner as the correlation 
coefficient, although its computation is quite different. 



Fig. 4. Hypothetical distribution 
with strong non-linear relationship but 
weak linear correlation. 


REFERENCES 

An interesting interpretation of r in terms of common elements, as well as a con¬ 
sideration of other measures of correlation, may be found in: 

Peters and Van Vooriiis, Statistical Procedures and Their Mathematical Bases , 
McGraw-Hill Book Company, pp. 101-109, 118-123. 

The derivation of the distribution function of r is very complicated; however, 
the student with sufficient mathematical background may find it in: 

Wilks, S. S., Mathematical Statistics, Princeton University Press, pp. 116-120. 

Kendall, M. G., The Advanced Theory of Statistics , Griffin and Company, 
vol. 1, pp. 339-342. 

The transformation used to approximately normalize the distribution of r was 
originated by R. A. Fisher. A discussion of this transformation and its adequacy 
will be found in: 

Fisher, R. A., Statistical Methods for Research Workers , 8th ed., Oliver & Boyd, 
pp. 190-195. 

Kendall, op. cit., pp. 345-346. 



EXERCISES 


93 


If a high-degree polynomial is to be used to fit a set of points, the solution of the 
normal equations involves much computational labor. For such problems it is ad¬ 
visable to follow an efficient computational procedure such as the Doolittle tech¬ 
nique. This procedure is outlined in: 

Croxton and Cowden, Applied General Statistics, Prentice-Hall, pp. 716-720. 

If a polynomial is being fitted to time data in which there corresponds one value 
of y to each value of t , and if the values of t are equally spaced, the normal equations 
can be simplified a great deal by choosing x — t — t, because then Sx* — 0 for k 
odd. The normal equations will then be found to split into two simpler sets of 
equations. 

If the degree of the polynomial to be used is uncertain and it is fairly likely that 
higher-degree terms will be added after the first fitting attempt, a better procedure 
exists in the form of orthogonal polynomials. These polynomials possess the de¬ 
sirable property of leaving unchanged the coefficients of the previously fitted poly¬ 
nomial when higher degree terms are added. If orthogonal polynomials were not 
used in such a situation, the entire set of coefficients would have to be recomputed. 
The technique of orthogonal polynomials is explained in: 

Fisher, op. cit., pp. 140-146. 

Further material on empirical curve fitting may be found in: 

Richardson, C. H., An Introduction to Statistical Analysis , Harcourt, Brace and 
Company, pp. 306-361. 

Kenney, J. F., Mathematics of Statistics , Part One, D. Van Nostrand Company, 
pp. 130-158. 

A measure of correla tion closely related to the correlation index is the correlation 
ratio. It is often used for measuring non-linear correlation because it does not re¬ 
quire the fitting of a curve to obtain errors of estimation but bases the errors upon 
the means of columns after the data have been classified. A brief discussion of 
this measure may be found in the preceding reference on pages 198-204, 


EXERCISES 

1. The following data are for the amount of water applied in inches and the 
yield of alfalfa in tons per acre, (a) Find the equation of the line of regression. 
(6) Calculate r. 


Water (x) 

12 

18 

24 

30 

36 

42 

48 

60 

Yield M 

5.27 

5.68 

6.25 

7.21 

8.20 

8.71 

8.42 

8.24 


2. The following data are for tensile strength (100 lb./in. 2 ) and hardness (Rock¬ 
well E) of die-cast aluminum, (a) Calculate r. ( b) Calculate the standard error 
of estimate using this value of r. (c) From ( b), using the mean tensile strength, 
determine an interval of percentage errors within which about 50% of such per¬ 
centage errors will fall. 
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Tensile strength 

293 

349 

368 

301 




313 

322 

334 

Hardness 

53 

70 

84 

55 

78 

64 

71 

53 


67 


Tensile strength 

' 

. 

377 

247 

348 

298 

287 

292 

345 

fl 

257 

258 

Hardness 

70 

56 

86 

60 

72 

51 

88 

95 

51 

75 


Tensile strength 

265 

281 

246 

258 

237 

286 

324 

282 

340 

Hardness 

54 

78 

52 

69 

54 

64 

83 

56 

70 


3. The following data are for intelligence-test scores, grade point averages, and 
reading rates of students. Calculate r between I.T. scores and G P.A. by classi¬ 
fying the data with respect to both variables. 


I.T. 

295 

152 

214 

171 

131 

178 

225 

141 

116 

173 

G.P.A. 

2.4 

0.6 

0.2 

0.0 

1.0 

0.6 

1.0 

0.4 

0.0 

2.6 

R.R. 

41 

18 

45 

29 

28 

38 

25 

26 

22 

37 


I.T. 

230 

174 

177 

210 

236 

198 

217 

i 

143 

186 

233 

G.P.A. 

2.6 

1.8 

0.0 

0.4 

1.8 

0.8 

1.0 

0.2 

2.8 

1.4 

R.It. 

39 

24 

32 

. 

26 

29 

34 

38 

40 ! 

i 

! 

27 

44 


I.T. 

136 

183 

223 

106 

134 

211 

151 

231 

135 

146 

G.P.A. 

0.2 

0.4 

1.4 

0.0 

0.8 

0.8 

0.4 

2.2 

1.4 

1.2 

R.R. 

32 

26 

50 

24 

48 

18 

20 

26 

26 

19 
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I.T. 

227 

204 

223 

142 

176 

238 

268 

163 

195 

184 

G.P.A. 

1.4 

1.4 

1.4 

0.8 

0.8 

2.6 

2.6 

0.2 

0.0 

0.8 

R.R. 

35 

26 

18 

22 

23 

27 

40 

33 

38 

32 


I.T. 

192 

121 

316 

234 

146 

261 

175 

233 

261 

242 

G.P.A. 

0.8 

0.6 

2.6 

1.2 

0.6 

2.6 

1.2 

1.6 

2.4 

1.4 

R.R. 

22 

34 

42 

41 

18 

35 

30 

34 

25 

49 


4. Would you consider students* high-school marks highly useful for predicting 
college marks if the correlation coefficient between them was 0.40? 

6. For the following data on the yield of wheat in bushels per acre and the num¬ 
ber of pounds of nitrogen applied per acre (a) calculate r, ( b ) fit a polynomial of 
the second degree, (c) calculate the index of correlation, (d) compare these two 
measures of correlation. 


Nitrogen 


Yield 

0-20 

20-40 

40-60 

60-80 

80-100 

100- 

120 

120- 

140 

140- 

160 

160- 

180 

32-36 




6 

15 

10 

4 

6 

2 

28-32 



1 

18 

20 

9 

5 

1 


24-28 


1 

15 

20 

3 





20-24 


2 

12 







16-20 


10 

2 







12-16 


8 








8-12 


4 








4-8 

10 










6 






. 






96 


FREQUENCY DISTRIBUTIONS OF TWO VARIABLES 


6 . What explanation would you give if told that r between fertilizer added and 
profit made in raising vegetables on a certain experimental farm was only 0.30? 

7. Test the hypothesis that the population correlation in problem 2 is equal to 

0 . 6 . 

8. How large a correlation coefficient is needed for a sample of size 25 before 
one is justified in claiming that the variables are related? 

9 . Prove that s x -y — s x 2 -f- s„ 2 — 2 r s x s v . 

10 . Find s 1 for the first n positive integers by using the formula for the sum of 
the squares of these integers. 

62(x - y) 2 

11 . If x and y arc the ranks of an individual, prove that r r = 1- ~~r y ---- , 

n(rr — 1) 

where r T is the correlation of the ranks for n individuals, by utilizing the results of 
problems 9 and 10. 

12 . By means of the formula of problem 11, find the correlation of ranks for the 
data of Table 1. 

13 . Prove that the regression hnc fitted to the means of columns when weighted 
with column frequencies is the same as the ordinary regression line for least squares. 

14 . Consider the following coin-tossing experiment. Toss 3 pennies and 2 
dimes. Let x be the total number of heads showing. Pick up the 3 pennies and 
toss them again. Let y be the total number of heads now showing. Perform the 
experiment 25 times, and calculate r. Calculate the theoretical value of r by find¬ 
ing the expected frequencies in various cells from probability considerations. This 
latter result, when generalized, shows how a correlation coefficient may be inter¬ 
preted in terms of common elements. 

15 . The following data give the velocity of the Mississippi River in feet per 

second corresponding to various depths expressed in teims of the ratio, I), of the 
measured depth to the depth of the river, (a) Fit a parabola V = a -f bD + cD 2 
to the data, choosing a convenient origin. ( b ) Find V when D — 0 9 (observed 
V — 2.976). (r) When would you consider extrapolation as used in ( b ) a valid 

procedure? 


D 

0 

0.1 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

V 

3.195 

3.230 

3 253 

3.261 

3.252 

3.228 

3.181 

3.127 

3.059 


16 . The following data are for a growing plant, (a) Plot the data on ordinary, 
semi-log, and log-log paper. (6) Fit a simple exponential 


Day 

0 

1 

2 

3 

4 

5 

6 

7 

8 

Height (inches) 

0.75 

1.20 

1.75 

2.50 

3.45 

4.70 

6.20 

8.25 

11.50 
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17. The pressure of a gas and its volume are related by an equation of the form 
po a » b. In a certain experiment the following values were obtained. Deter¬ 
mine a and b by least squares on the logarithmic equation. 


V (kg./cm. 2 ) 

0.5 

1.0 

1.5 

2.0 

2.5 

3.0 

v (liters) 

1.62 

1.00 

0.75 

0.62 

0.52 

0.46 


18. Derive the least-squares equations for fitting a modified exponential, y * c 
-f ae ht j to a set of n points, and indicate why these equations would be difficult to 
solve. 



CHAPTER VI 


THEORETICAL FREQUENCY DISTRIBUTIONS 
OF TWO VARIABLES 


ADDITIONAL PROPERTIES OF DISTRIBUTIONS OF TWO VARIABLES 


1. Discrete Probability 

Consider the multiplication rule of probability for discrete variables 
given by (20), Chapter III, written in the form 

(1) P(x, y) = P{x)P x (y) 

Here x will be treated as a discrete variable which takes on different 
values corresponding to the different results of the trials of an event. 
For example, if a die is being rolled, x will assume an integral value 
from 1 to 6; or, if a coin is being tossed, x will assume a value of 0 or 
1 , 0 corresponding to a tail, say, and 1 corresponding to a head. The 
variable y will be treated similarly with respect to a second event. 
Hence, to every possible result of two events there will correspond a 
point in the x, y plane to which will be attached a probability P(x , y). 

Since P x (y ) gives the probability distribution of y for a fixed value 
of x, the sum of P x {y) over all possible values of y for this fixed value 
of x must be 1; consequently if both sides of (1) are summed over 
these values of y } 

(2) y) = P( X ) 

V 

In a similar manner it follows that 

y^P(x, y) = P(y) 

x 

These formulas show that, if one has the joint probability distribution 
for two variables and desires the probability distribution of one of them, 
it is merely necessary to sum the joint probability function over all 
values of the other. P{x) and P{y) are called the marginal distribution 
functions of P(x , y). 

If formula (1) is written in the form 


(3) 


Px(y) = 


P(x, y) 
P(x) 
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it shows that if one has the joint probability distribution for two 
variables and desires the conditional probability distribution for one of 
them when the other is held fixed, it is merely necessary to divide this 
joint probability function by the marginal distribution function of the 
fixed variable. P y {x) is obtained and interpreted in a similar manner. 
The two functions P x (y ) and P y (x) are 
often called the x and y array distribution 
functions of P(x, y). 

For the purpose of illustrating these ideas, 
suppose that a bag contains 4 white and 2 
black balls and that 2 balls are drawn from 
the bag. Let x and y represent the results 
of the two drawings, 0 corresponding to a 
black ball (failure) and 1 corresponding to 
a white ball (success). Then every possible 
result will be represented by one of the four 
points indicated in Fig. 1. From the contents of the bag and formula 
(1) it follows directly that 

P(0, 0) = P(0)P 0 (0) = !~i = tj 

P{ 0, 1) = P(0)Po(l) = i-i = "TS 

P(1,0) = P(l)Pj(0) =£-f = •& 

p(i, i) = P(i)Pi(i) = i-i = * 

If, instead, it is assumed that only the final values of the P(x, y) just 
calculated are known, the x marginal distribution, for example, could 
be obtained by applying (2). Here 

P(0) = P(0, 0) + P(0, 1) = = 

P0) = p(i, o) + P(i, i) = -A + & “ I 

Finally, with these same assumptions, the x array distribution for 
x == 1, for example, could be obtained by applying (3). Here 



iii_ ml 

0 1 

Fig. 1. Discrete probabil¬ 
ity distribution. 


Pi(0) = 


Pi(i) = 


POO) 
P( i) 
P(i, i) 
P( i) 


2. Probability Density 

For a continuous distribution function of two variables, f(x, y), it is 
often convenient to think of f(x, y) as representing a probability density 
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distribution in the x , y plane, just as P(x, y) is often thought of as 
giving the masses for a discrete set of points. Figure 2, without the 
curve, illustrates this manner of interpretation for two continuous 
variables, just as Fig. 1 does for two discrete variables. In this con¬ 
nection, it is helpful to conceive of the x, y plane as being a metal sheet 
whose thickness at any point x, y is proportional to f(x, y) and whose 
total mass is 1. This corresponds to the situation for one variable in 

which the x axis is thought of as a 
wire whose thickness at any point x 
is proportional to f(x) and whose 
total mass is 1. 

On other occasions it is convenient 
to think of f(x, y) as representing a 
surface in three dimensions with 
properties analogous to those of a 
distribution curve for one variable. 
From this point of view, because of 
(1), Chapter IV, it follows that the 
probability that a point x , y will lie in a given rectangle in the x ) y plane 
is equal to the volume under the surface z = f(x, y) which lies above 
this rectangle. The total volume under this surface and above the 
x, y plane is, of course, 1. 

From the density point of view, the probability that a point will lie 
in a given rectangle is equal to the mass of this rectangle. Both these 
physical interpretations of probability clearly hold for regions other 
than rectangles in the x, y plane. 



Fig. 2. Probability density distri¬ 
bution and curve of regression. 


3. Marginal Distributions 

For the purpose of obtaining a formula for a continuous variable 
corresponding to (2), consider 

P[a < X < p] = P\ 01 

L a 2 <y < b 2 J 

n f>2 

/(>, y ) dy dx 

2 

= g(x ) dx 

where as usual a 2 , b 2 indicates the total range of y values and where 
(4) 


£ 2 

/O, y) dy 
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Now, if x is considered independently of y ) then by definition 

P[a < x < 0\ = Cm dx 


If these two expressions for P are equated, 


(5) 




dx 


Since this equality is to hold for all intervals (a, 0), a may be held 
fixed and j3 allowed to vary, in which event these integrals may be 
treated as functions of By a well-known calculus formula, if 


then 


m =£/(*) d * 


dm 

dp 


= m 


If (5) is differentiated with respect to p, this formula may be applied to 
give 

9(0) = m 

Since this is an identity in p, it follows from (4) that 


(G) J /O, y) dy = f(x) 

This formula corresponds to (2) for the discrete case. In a similar 
manner, the integration of f(x, y) with respect to x over its range 
(ai, bi) would yield f(y). The functions f(x) and f(y) are called the 
marginal distribution functions of f(x, y). From the density point of 
view, f(x) represents the probability density distribution on the x axis 
after the entire mass in the x , y plane has been projected onto the x 
axis. 


4. Array Distributions 

By analogy with (3), consider the function defined by 


(7) 


fx(y) = 


f(x, y) 
fix) 


If x is held fixed but is such that /(x) > 0, (7) defines a non-negative 
function of y for which, because of (G), 
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f fx(y) dy 

Ja<i 


r /(^> y) 

fix) 


dy 


1 

fix) 


f 


/O, 2 /) % 


1 


Thus, ( 2 /) has properties that enable it to serve as a distribution func¬ 
tion of y. It is often called the conditional distribution function of y 
because it gives the probability distribution of y for a fixed value of 
x y just as P x (y) in (3) does for a discrete variable. It is also called the 
x array distribution function of f(x> y). The function f y (x) is defined 
and named in an analogous manner. 

From a density point of view, f x {y) may be thought of as giving the 
probability density distribution along the vertical line corresponding 
to the fixed value of x , the total mass of this line being 1. The density 
function f(x y y) as it stands could not be used as a probability density 
function along such a line because by (6) it would not give a total 
probability of l unless fix) were equal to 1. The factor l/f(x) insures 
that the total mass of the line will be 1. 


6. Curve of Regression 

Consider the mean value of the array distribution function f x (y). 
If this mean value is denoted by y X} then by definition and (7) 


( 8 ) 


£ 2 

yfxiy) dy 

_ 



f(x, y) 

m 


dy 


Now y x is evidently a function of x ; hence it defines a curve in the x y y 
plane which is called the curve of regression of y on x. Thus, the curve 
of regression of y on x is the locus of the mean points of the x array 
distributions. The situation is illustrated in Fig. 2. 


6. Product Moments 

Product moments about the origin and about the mean for two 
variables are defined by 

(9) y P q = f f x p y Q f(x, y) dx dy 

%)(12 •/a 1 

and 

(10 ) Mj >9 = I f (x — m x ) p {y - m v ) a f{x, y) dx dy 

Ja% Ja\ 

The product moment mii> called the covariance , is of particular interest 
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because the theoretical correlation coefficient, p, is defined in terms of 
it by means of 


( 11 ) 


p = 


Mil 

(T X (T V 


From (9) and (10), with the aid of (6), it follows that 

MOO “ 1 ) MlO = M 01 “ Wlyy M 20 ~ f M 02 ~ Gy 


From (10) it will be observed that pn is the theoretical counterpart 
of £(# — x)(y — y)/n in the numerator of (9), Chapter V, and there¬ 
fore that p is the theoretical counterpart of r. 


NORMAL DISTRIBUTION FUNCTION OF TWO VARIABLES 


1. Definition 


The definition here arises from a generalization of that for one 
variable. However, rather than start with a quadratic exponential 
function of a certain type and then determine the parameters of this 
function in terms of familiar statistical quantities as was done with 
one variable, here the results of such determinations will be used. 
Hence, a normal distribution function of two variables will be defined 
by 


(12) f(x, y ) 


1 f f X - m x\ 2 f y~~m v \ fy- mv \Z-\ 

~~t x ) - 2p v~^~)\rw ) + \~y ) J 

2 ira x ay '\/ r 1 — p 2 


It will be found that this function possesses the essential properties 
for a distribution function. Furthermore, it will be found that the 
parameters m X} m y , <r x , <r y , and p are consistent with their values as given 
by the product moment definitions in (11) and immediately following. 


2. Marginal Distributions 

If (G) is applied to (12), the x marginal distribution function will 
be given by 


fix) 


_1 r ( x - m *Y- 2 p( x rja*\ (^~ m jL\ + (yjz^vy 1 

poo e 2(l-p^LV <r* ) p \ <r r . )\ c r v J^\ <r v ) J 
J-o o 2tT(T x <J v \ / 1 — p 2 


Let u = (x — m x )/(T x and v = (y — m y )/<jy; then dy = cr y dv and 


fix) = 


27rcr* 1 


£ 


-«7|- -2][^ 2 -2pUV-h t> 2 ] 

e 2 U-P 2 ) dv 
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Adding and subtracting p 2 u 2 to the exponent in order to complete the 
square in v gives 


fix) = 


2 / ircr x X / ^l~— p 2 J- 


/: 


— KJZ —— 2 puv +p 2 U 2 — p&U 2 4- m 2] 

e 


2 x 0-^^/1 — p 2 */ - 


r 


,-20^’-^ 


If 2 — ~7 , then dv = 

Vi - p 2 

/(*) 

Substituting back the value of u in terms of x and inserting the value 
of this familiar integral reduces f{x) to 

1 1 / x-m x \ z 

(13) f(x) = -==- e a V - ) 

V 2x0-3 

Since the corresponding result for y follows from symmetry, (13) 
shows that the marginal distributions of a joint normal distribution 
are normal. By means of (13), one may also show that several of the 
product moment definitions are consistent with the labeling of the 
parameters in (12). For example, it follows at once from (13) that 
the coefficient of the exponential in (12) is such as to produce unit 
volume under the surface z — 

For the particular case of uncorrelated variables, p = 0 and then 
(12) reduces to 

1 Sx~ m x \ 2 1 / y — m v \ 2 

p 2 V Ox ) p 2 \ Oy ) 

f(x, y) = -7=- • -—7=- = f(x)f(y ) 

V 2x0-3 v 2 rcay 

Because of definition (2), Chapter IV, this result shows that, if two 
normal variables are uncorrelated, they are independently distributed. 
From the discussion of curvilinear correlation in Chapter V, it should 
be clear that a lack of correlation does not guarantee a lack of relation¬ 
ship in general. 

3. Array Distributions 

A joint normal distribution possesses array distributions with unusu¬ 
ally interesting properties. It will suffice to study the x array distribu- 


VT — p 2 dz and 


e * 

2xctj 


/: 


A 

2 dz 
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tion function, f x (y ). A direct application of definition (7) to (12) and 
(13) gives 

e -2o^) lui -^ uv+ ^ e ~y 
f x (y) = --;-- 

2lT(Jx<*y V 1 — p 2 V 2 tT(J x 

- 2 ( rv) [ ’ ! - 2 '" +A ’ 1 

v 7 2ttc v y /1 — 

_ir »-p« -| 2 
e 2Lvr-pJ 

V 2 w u V 1 - p 2 


If the values of ?/ and v in terms of x and y are inserted and if the value 
of y is denoted by y x to show its dependence on x, fxiy) reduces to 


(14) 


fxiy) = 


e L 


Vr — m u — p—- (x — m x ) 

_ ** T _ 

<r v \/i-p 2 


V 2 t<t v V 1 


Since x has a fixed value and y x is the variable, (14) shows that y x 
posseses a normal distribution with mean m y + p — (x — m x ) and 

<?x 

standard deviation <r 2/ V / 1 — p 2 . Thus, the array distributions, as 
well as the marginal distributions, of a joint normal distribution are 
normal. Since by definition (8) a curve of regression is the locus of 
the means of array distributions, it follows from (14) that the curve of 
regression of y on x for x and y jointly normally distributed is the 
straight line 

(T y 

(15) y x = m v + p — (x — m x ) 

<7x 

A comparison of this equation with 


y' = y + r — (x - x) 

S x 

which is the least-squares line of regression for a set of points as given 
by (10), Chapter V, shows that for normal variables this least-squares 
line may be treated as a sample approximation to the population line 
of regression. Since it is not unusual to find variables that are approxi¬ 
mately normally distributed, one might expect to find related pairs of 
such variables which possess an approximate joint normal distribu- 




106 THEORETICAL FREQUENCY DISTRIBUTIONS 

tion. For such variables one would expect to find approximate linear 
regression and to find that r would serve as a satisfactory measure of 
the usefulness of such regression lines for estimating purposes. These 
theoretical results help to make the use of linear regression and correla¬ 
tion coefficients seem more logical and also help to explain why so 
many sets of data of two variables can be treated satisfactorily by 
these simpler methods. 

Another feature of (14) which is of practical importance is that all the 
x array distributions possess the same standard deviation a v v 1 — p 2 . 
This characteristic is sometimes denoted by the term homoscedasticity . 
Since <r y y /1 — p 2 is the theoretical standard deviation of the errors of 
estimation, which have been shown to be normally distributed, this 
property implies that the precision of the estimation of y from x is 
the same for all values of x. "Thus, if one is interested in predicting 
only y’s for a fixed x but has data for other values of x as well, he may 
compute his standard error of estimate based on all the data, namely 
S y VT ~ r 2 , and use it as his standard error of estimate for the y ’s 
that interest him. It is clear that the value of s y \/1 — r 2 based on all 
the data would be much more reliable than the standard error of esti¬ 
mate computed directly from only those y’ s that are of particular 
interest. 


4. Normal Surface 


Instead of thinking in terms of probability density in the plane, con¬ 
sider now the geometry of (12), treating it as the equation of a surface 
in three dimensions. If (7) and the particular results (13) and (14) 
are applied, the equation of this surface may be written 


(16) 


« = fix) 


V — m v — p—ir—m?) 

_ <[£ _ 

— P 2 


y/ 27T(7y '\ /// 1 


For the purpose of studying this surface, consider its intersections 
with planes perpendicular to the x axis. The equations of the inter¬ 
secting curves are obtained by replacing x by the constant values 
corresponding to the cutting planes. From (16) it will be observed 
that these curves are normal curves with their means lying on the regres¬ 
sion line (15), all having the same standard deviation a y v 1 — p 2 , and 
varying in maximum height according to the factor/(x). The tallest 
such normal curve will be the one lying in the cutting plane x — m x , 
since this value makes fix) a maximum. By symmetry, planes per- 
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pendicular to the y axis will intersect the surface in normal curves with 
corresponding properties. 

Further information is obtained by considering the intersection of 
the surface by planes perpendicular to the z axis. In this connection 
it is more convenient to use the original form (12) with f(x, y ) replaced 
by z. If z assumes different constant values, the quantity in brackets 
will assume corresponding values that can be calculated from the 



Fig. 3. Normal correlation surface. 


constant values assigned to z. Hence the equations of such intersecting 
curves may be written in the form 


(x - m x \ 2 ix m x \fy - m„\ (y - m y \ 2 

~ 2 ' + v-77/ ■* 


where k corresponds to the selected value of z. Since this is a quadratic 
function in x and y , these curves of intersection must be conic sections. 
Furthermore, since the type of conic section depends only on the 
quadratic terms, the discriminant for testing conic sections may be ap¬ 
plied directly to give 
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This result shows that the intersecting curves are ellipses, except in the 
trivial case of p = dtl. Allowing k to assume different values will 
merely change the sizes of these ellipses; consequently these ellipses 
have the same centers and the same orientation of principal axes. It 
will be found upon rotating axes properly to eliminate the xy term that 
the principal axis of these ellipses is not parallel to a line of regression 
as might be supposed. The line of regression turns out to be parallel 
to a diameter of the ellipses. 

A sketch of a normal correlation surface which shows these various 
geometrical properties is given in Fig. 3. 
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EXERCISES 

1. Could the function + serve as a distribution function of x and y over 
some region? 

2. Verify by integration that definition (12) is consistent with the moment 
properties given by (11) and immediately following. 

3. Given/(x, ?/) = 1 |q ^ («) w»', Q>) m ot '> ( c ) p, (d) y x . 

4. Given /(x, = | q < ^ < ^} > find ( a ) ( 6 ) P> 0) Vx- 

6. Prove or disprove that all vertical plane sections of a normal correlation 
surface are normal curves. 

6. If x and y are uncorrelated and normally distributed, find the mean and 
variance of xy. 

7. Prove that the line of regression for a normal distribution is a diameter of 
the ellipses of constant probability. 

8 . Assume that a bomber is making a bombing run in the direction along the 
positive y axis at a square target 200 feet by 200 feet whose center is at the origin 
and whose sides are parallel to the coordinate axes. Assume further that the x 
and y errors in repeated bombing runs are normally distributed about zero. 

(a) If the x and y errors are also independently distributed with <r = 400 feet, 
find the probability that the target will be hit on the first run. 
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(b) Under the conditions of (a), find the probability of getting at least 1 hit in 
10 runs. 

(c) Under these same conditions, how many runs, that is how many planes, 
would be needed to make the probability at least 0.9 of getting at least 1 hit on the 
target? 

(d) Show why it would be difficult to work (a) if the x and y errors were corre¬ 
lated, say, with p = H- 



CHAPTER VII 


FREQUENCY DISTRIBUTIONS OF MORE THAN 
TWO VARIABLES 

MULTIPLE LINEAR REGRESSION 

It happens quite often that the methods of Chapter V for estimating 
one variable by means of a related variable yield poor results, not 
because the relationship is far removed from the linear one assumed 
there but because there is no single variable sufficiently closely related 
to the variable being estimated to yield good results. However, it 
may happen that there are several variables which, when taken jointly, 
will serve as a satisfactory basis for estimating the desired variable. 
Since linear functions are so simple to manipulate and since experience 
shows that many sets of variables are approximately linearly related, 
it is reasonable to attempt to estimate the desired variable by means 
of a linear function of the remaining variables. For this purpose, let 
Xi, X 2 , • • •, Xk represent the 1c variables available, and consider the 
problem of estimating variable X\ by means of a linear function of the 
remaining variables. If the estimated value of Xi is denoted by XT, 
the relationship may be expressed as 

(1) XT = c 0 + c 2 X 2 + c 3 X 3 + ■ • • + c*X* 

where the c *s are to be determined by means of available data. Geo- 
metrically, the problem is one of fitting a plane to a set of points in k 
dimensions. 

Suppose that a sample of size n is available for these k variables. 
For example, n different skilled workmen may have been rated by their 
foreman and been given k — 1 different tests designed to measure 
ability in that particular type of work. Then it would be of interest 
to see whether success in this type of work could be estimated well by 
means of a linear combination of the k — 1 test scores. 

As in dealing with two variables, the unknown c’s in (1) will be 
determined by the principle of least squares; consequently the c’s 
will be chosen to minimize 2(Xi — XT) 2 , where the sum extends over 
the n sample values. It is more convenient, however, to work with 
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variables measured from their sample means than with the variables 
themselves; hence first let 

Xi = X<-Xi (i« 1,2, 

If aq' is defined by x\ = X^ — Xi, then 

(2) X x - X x ' = m + X x - (*!' + X x ) « *1 - * 1 ' 

If now the capital X’s in (1) are expressed in terms of the small x’s, 
that equation can be written 

(3) xi = «o + 0 * 2 X 2 + 0 3 x 3 4-f- 

in which the a’s could be expressed in terms of the c’s and X’s if so 
desired. However, from (2) it is clear that minimizing 2(X x — XV) 2 
is equivalent to minimizing 2(zi — x x ) 2 \ consequently one can just 
as well determine the a’s so as to minimize the latter sum, which because 
of (3) may be written 

G(a 0 , a 2 , • • *, a k ) = 2[x x — oq — a 2 x 2 - a k x k ] 2 

As in polynomial fitting, the normal equations of least squares are 
obtained by setting the k partial derivatives of G equal to zero. If, 
as was done following (14), Chapter V, these equations are multiplied 
by the summations performed term by term, and the first sum 
transposed to the right side, these equations assume the form 

a 0 n + a 2 2:r 2 4-f- a k 2x k = 2xi 

a 0 2x 2 4- a 2 Zx 2 2 4-h OkZx 2 x k ~ ^x 2 x x 


aoZx k 4- a 2 Hx k x 2 4-h o k Hx k 2 = Zx k x x 

Since 2z t = 2(X* — X t ) — 0, all terms in the first equation, except 
the first term, vanish. This implies that a 0 = 0, and thus the number 
of equations to be solved has been reduced by 1. The advantage of 
using variables measured from their sample means to simplify the 
notation and solution of the normal equations should be clear from this 
result. Now the coefficients in these equations can be expressed in 
terms of familiar statistical quantities, because by (9) Chapter V, 

(4) 2 x x Xj = 2(X t — X % ) (Xj — X,) = nvijS^j 

where r tJ denotes the sample correlation coefficient between the 
variables X* and Xy, and s z is the sample standard deviation of X*. 




112 FREQUENCY DISTRIBUTIONS OF MORE THAN TWO VARIABLES 

If such expressions are inserted for all sums, and common factors are 
canceled, the normal equations become 

d 2 r 22 S 2 + ^3 r 23 s 3 + * * * + dk^2k^k = ^21^1 
d 2 Tz 2 S 2 + «3 r 33 s 3 H-f" d^'skSk = ^31 s l 


d2rk2$2 + «3 r fc3^3 + * * * + dkXkk^k = 

The solution of these equations may be expressed in a convenient 
form by means of determinants, provided the determinant of the coeffi¬ 
cients is not zero. Solving for a x and factoring out all common factors 
from both numerator and denominator determinants gives 



r 22 

7*23 

7*21 • 

* 7*2 k 


7*32 

r 33 

7*31 ‘ 

* r- 6k 

• S ? —lSi-fl • * * Sk 

7*fc2 

7**3 * * ‘ 

7* /cl • 

• 7>fc 

* s i — * * 'Sfc 

r 22 

7*23 * * * 

7*21 ‘ 

■ * r 2k 


7*32 

7*33 * * * 

7*31 * 

* • 7* 3 fc 


nt2 

r*3 * * * 

7 'hi * 

7* kk 


It should be noted that these two determinants differ only in the ele¬ 
ments occurring in the i — 1 column. For the purpose of evaluating 
these determinants, consider the following determinant: 


r ii 

7*12 

7*13 * ' 

' * T\k 

7*21 

7*22 

7*23 * ' 

’ * r 2k 

7*31 

7*32 

7*33 * ' 

■ • r 3k 

7**1 

r k 2 

r k 3 * ' 

TTfcfc 


It will be observed that the determinant in the denominator of (5) 
is the minor of r n in R. It will also be observed that, if the column in 
the numerator determinant of (5) which is headed by r 2 1 is shifted to 
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the first column position, the resulting determinant will be the minor 
of T\i in R. Since the interchange of two columns of a determinant 
changes the sign of the determinant, and i — 2 such interchanges 
are needed to bring this column into the first-column position, the 
value of the numerator determinant will be the minor of r Ll multiplied 
by ( — 1 ) l ~ 2 . By using cofactors rather than minors, the question of 
the proper sign is answered at once. The cofactor of an element r# 
is defined as (— l) t+j times the minor of and is usually denoted by the 
corresponding capital letter Since the numerator determinant of 
( 5 ) is ( —1 ) l ~ 2 times the minor of r u , which is equivalent to — ( — 1)* +1 
times the minor of r Xl , and since R Xt — ( —l) 1+l times the minor of 
r XlJ it follows that the value of this determinant is — R Xt , This result, 
together with the fact that the denominator determinant is equal to 
R xx , reduces (5) to 

siRu 

a, =- 

s t R 1 1 


If these values are substituted into (3), the equation of the least-squares 
regression plane becomes 


Ru , , R12 , Ris , , R\k 

(7) - X\ H- x 2 4--1- x k = 0 

Si S 2 S3 Sjc 


As an illustration of the application of this formula, consider the 
following information concerning the three variables X x , the amount 
of hay in units of hundreds of pounds per acre, X 2 , the spring rainfall 
in inches, and X s , the accumulated temperature above 42° F. in spring. 


Here 


X x - 28.0, s x = 4.4, r 12 - 0.80 

X 2 = 4.91, s 2 = 1.10, r 13 = -0.40 

X 3 = 594, 83 = 85, r 23 = -*0.5G 


R = 

1 

0.80 

0.80 

1 

-0.40 

-0.5G 

, R x 1 — 

1 

-0.56 


-0.40 

-0.56 

1 


-0.56 

1 


R12 — “ 


0.80 -0.56 
-0.40 1 


= -0.58, R 13 = 


0.80 1 
-0.40 -0.56 


= 0.69 


- -0.05 


If these values and the values of the s ’s are inserted in (7), the equation 
of the desired regression plane becomes 


0.1 6 x 1 ' — 0.53x 2 — O.OOO 6 X 3 — 0 
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If this equation is expressed in terms of the original variables, it 
reduces to 

X x ' = 3.3X 2 + 0.004X 3 + 9.5 

This equation indicates that, if X 3 is held fixed, the amount of hay 
increases about 330 pounds per acre with each inch increase in spring 
rainfall. On the other hand, if spring rainfall is held fixed, the accumu¬ 
lated spring temperature would have to increase about 3 standard 
deviations, which is 255 units, in order to increase the amount of hay 
by 100 pounds per acre. Thus, it appears that the spring temperature 
is relatively of little importance compared with spring rainfall. Such 
conclusions, of course, are only approximately true. They depend upon 
the' variables’ being approximately linearly related, and they express 
only average relationships. 

The relative importance of the independent variables in a regression 
equation can be ascertained by comparing the coefficients of those 
variables after the equation has been written with all variables ex¬ 
pressed in standard units. Since it is merely necessary to write each 
standard deviation in (7) under the corresponding x to express the 
variables in standard units, the relative importance of a variable X; 
for estimating X x is determined by Ru. In this problem R X2 — —0.58 
and Ri 3 = —0.05; hence rainfall is much more important than temper¬ 
ature for hay production under these conditions. 

As in polynomial fitting, if the number of variables is large, the 
solution of the normal equations may be expedited by the Doolittle 
technique. 


STANDARD ERROR OF ESTIMATE 

After a regression equation has been obtained, it is important to 
know how useful the equation is for estimating purposes. With two 
variables, it was found that the correlation coefficient gave a measure 
of the usefulness of the regression line for estimating y from x. For 
more than two variables, it is possible to generalize the definition of 
the correlation coefficient to give such a measure. Toward this end, 
consider the variance of the errors of estimation. 

Because the x’s are measured from their means, it follows from (7) 
that 

1 1 

e = -2(xi — xi) — ~ {Sxi — Sxi'} = 0 

n n 

consequently 

s e 2 = - 2(e - e) 2 = - Se 2 = - 2(x x - x x ') 2 
n n n 
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If the value of x\ given by (7) is inserted and common terms are fac¬ 
tored out, this variance becomes 


2 _ 




nR n 2 


Xi x 2 

R 11 - + R 12 -+' 
Si s 2 


* + Rlk 


Xjc 

Slc- 


The right side will simplify readily if the squaring is performed by 
multiplying the quantity in brackets in turn by each member inside 
and then performing all summations on the x J s with the common factor 
1/n inserted. If such sums are expressed in terms of correlation coeffi¬ 
cients by means of (4), s 2 can be expressed as 

$i 2 

Se 2 = — {^lil/n^n + r 12 R 12 H-1- rijcRik] + 

R 11 

Rl2^2lRll + 7*22^12 H-b T 2 kRlk] + 


RiklrkiRn + Ti 2 Ri2 d-b 

From (6) it will be observed that the first pair of brackets above con¬ 
tains the expansion of R by means of minors of elements of the first 
row. The second pair of brackets contains the sum of products of the 
elements of the second row of R by the cofactors of the elements of the 
first row. But this is the expansion of the determinant obtained by 
replacing the first row of R by the second row of R. Since the value 
of a determinant with two rows alike is zero, the value of the quantity 
inside the second pair of brackets is zero. In the same manner it can 
be shown that the remaining quantities in brackets vanish; hence s e 2 
reduces to 

o o R 

(8) S c 2 = Si 2 —- 

an 

Although this formula is useful for measuring the precision of (7) 
for estimating X\, it was derived here primarily for application in the 
next section. 

MULTIPLE CORRELATION COEFFICIENT 


The multiple correlation coefficient is defined by analogy with the 
definition of the ordinary correlation coefficient for two variables given 
by (5), Chapter V. To distinguish it from the ordinary correlation 
coefficient, it is denoted by r 1 . 2 3 * • By definition 


2 
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The same reasoning that was followed for two variables may be 
applied here to show that 71 . 23 ...* may serve as a measure of the useful¬ 
ness of the regression plane for estimating purposes. Although there 
are k possible regression planes and hence k multiple correlation coeffi¬ 
cients, ordinarily there is but one variable which it is desired to esti¬ 
mate; consequently r x 23 ...& is usually the only multiple correlation 
coefficient of interest. If ^. 23 ..is close to 1, the n points correspond¬ 
ing to the sample of n must lie near the regression plane and therefore 
the k variables are likely to be approximately linearly related, thereby 
justifying the use of a linear function. However, if ^. 23 ...*; is close to 
0 , the relationship is either a weak linear one or a curvilinear relation¬ 
ship of unknown strength. For example, r x . 2 3 ...k would be small if 
the n points were somewhat uniformly distributed inside a paral¬ 
lelepiped with sides parallel to the coordinate planes or if the points 
lay near the surface of a hemisphere with its base perpendicular to 
the Xi axis. 

The following formula for calculating r x 23 ...* as a function of ordi¬ 
nary correlation coefficients is obtained by inserting the value of s e 2 
given by ( 8 ) into the definition. Thus, 


(9) 


r\. 23 .. 



R 

Rn 


It is customary to choose the positive root here. Incidentally, it can 
be shown that R and R lt are non-negative and that R < R xl ; conse¬ 
quently an imaginary value for ri. 2 3 .../b indicates an error in computa¬ 
tions. 

For the illustrative problem of the first section, calculations give 


r 1 23 


4 


0.24 

0.69 


0.81 


Since the regression plane cannot give a worse fit than the regression 
line in the X x , X 2 plane, the multiple correlation coefficient must be 
at least as large as any simple correlation coefficient with the integer 
1 as one of its subscripts. The fact that r x . 23 is only slightly larger 
than 7*12 shows that the variable contributed practically nothing 
toward the usefulness of this regression plane for estimating hay yield. 
One could have estimated about as well with the regression line of 
X x on X 2 . 

PARTIAL CORRELATION COEFFICIENT 


When several variables are interrelated, the simple correlation coeffi¬ 
cients between pairs of such variables may give misleading informa- 
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tion. For example, for the illustrative exercise of the first section, the 
fact that 7*13 — —0.40 would seem to indicate that, the warmer the 
weather was in the spring, the less was the yield of hay. However, 
this fairly large negative correlation is due to the fact that rainy 
weather is usually cool weather and that hay yield increases with 
rainfall. Thus, the true relationship between temperature and hay 
production is masked by the effect of rainfall on these variables. In 
order to study the true relationship between two such variables, it is 
necessary to hold all other closely related variables fixed. The corre¬ 
lation coefficient between two variables when the remaining variables 
under consideration are held fixed is known as the partial correlation 
coefficient. In practice, this correlation coefficient will usually vary 
with the particular values assigned to the remaining variables; conse¬ 
quently it is customary to define the partial correlation coefficient 
somewhat differently. Furthermore, if one had a sample of size 100 
or less, the amount of data available for calculating a correlation 
coefficient for particular small ranges of values of the remaining varia¬ 
bles would be so small as to yield a coefficient of questionable validity. 

To obtain a formula that may serve as the definition of the partial 
correlation coefficient, consider the following idealized situation for 
three variables. Instead of having but n points in three dimensions, 
let the three-dimensional space with the X x axis vertical be filled with 
a probability density distribution analogous to the procedure with 
two dimensions in the section on probability density in Chapter VI. 
Then consider the density distribution in the vertical plane whose 
equation is X 3 = k. From the discussion concerning ( 8 ), Chapter VI, 
there exists a regression curve of X\ on X 2 in this plane which is the 
locus of the mean points of vertical array distributions. If X 3 = k 
is permitted to vary continuously, this regression curve which is a 
function of X 3 will generate a regression surface of X x on X 2 and X 3 
in the three-dimensional space. In a similar manner, there exists a 
regression curve of X 2 on X\ in the plane X 3 = k which is the locus 
of the mean points of horizontal array distributions. As the variable 
X 3 = k assumes continuously changing values, this regression curve 
generates a second regression surface, now of X 2 on X x and X 3 . From 
the preceding geometry it follows that, if the equations of the two 
regression surfaces are known, the equations of the two regression 
curves in the X 3 = k plane could be obtained by merely replacing 
X 3 by lc in the surface equations. The derivation of the formula for 
partial correlations rests upon this relation between regression curves 
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and regression surfaces. The geometry of the relationship is illustrated 
in Fig. 1 for linear regression. 

For a finite number, n, of points in three dimensions, such an idealized 
situation can be only approximated. It is possible to approximate a 
regression surface by its sample least-squares value provided that the 
form of the surface is known. For example, if the regression surface of 
X\ on X 2 and X 3 is known to be a plane, equation (7) may serve as 
an approximation to the population regression equation. It is also 



possible, but highly unsatisfactory because of the scarcity of such 
points, to approximate the regression curve of X\ on X 2 in the X 3 = k 
plane by using the sample points lying in or very near this plane. To 
circumvent the difficulty, the relation between regression curves and 
regression surfaces which was discussed in the preceding paragraph will 
be utilized to obtain an approximation to the desired regression curve 
in the X 3 = k plane from its corresponding regression surface. 

Now it will be assumed that the population regression surfaces of 
X\ on X 2 and X 3 , and of X 2 on Xi and X 3 , are planes. It can be shown 
that they are, for example, if the three variables are normally distrib¬ 
uted. As a result of the assumption, the regression curves of X\ on 
X 2 and of X 2 on Xi lying in the X 3 = k plane will be straight lines. 
Now by means of (7) the equations of the least-squares approximations 
to these two regression planes will be 
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( 10 ) 

and 


Xi' = 


Rl2*l Rl3 s l 

' %2 ~ ~ 


R\l^2 


RllS* 


X 2 ' = 


R 2 l$2 R 2 3^2 

- Xl ~ —- 2*3 


Z?22 s 1 


R 2 2$3 


Since x 3 = X 3 — ^ 3 , if x 3 in these equations is replaced by its value 
X’s — k — Xs, the equations will become the equations of the two lines 
lying in the X 3 = k plane which may serve as approximations to the 
corresponding population regression lines in that plane. By means 
of the resulting equations it will be possible to obtain the desired 
correlation between Xi and X 2 in the X 3 — k plane. The procedure 
for using the two regression line equations in a plane for obtaining the 
correlation coefficient in that plane will therefore be considered next. 

From (10), Chapter V, and symmetry, it follows that the equations 
of the two regression lines in the x, y plane may be written in the form 


y — r — x = mix 

and 

x = r — y — m 2 y 

Sy 


It will be observed that r can be computed from these equations by 
means of the formula 

(11) r = ± V / mim 2 


where the sign of the radical is chosen to be the same as that of the 
slope coefficients, m\ and m 2 . 

A direct application of (11) to the regression-line equations given 
by (10) with x 3 = k — X 3 yields 


( 12 ) 



Rl2 

y/ R\\R22 


R 2 I s 2 

R 22 Si, 


Since it is conventional to choose the sign of r to be the same as the 
sign of mi and m 2} the negative sign should be chosen here because 
mi > 0 when R i2 < 0 and m x < 0 when R X2 >0. It is clear tha* 
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this formula does not depend upon what particular value of x 3 is 
inserted in equations ( 10 ); consequently formula ( 12 ) possesses the 
desirable feature of measuring the correlation between X x and X 2 
for a fixed value of X 3 , regardless of what fixed value is used. 

This derivation does not depend on the number of variables involved 
provided that the corresponding assumptions of linear regressions are 
made for the additional variables. In the equations (10) one would 
merely set the additional variables occurring on the right equal to 
their assigned fixed values. Thus, (12) may be generalized for the case 
of k variables to give the correlation coefficient between Xi and X 2 
with all other variables held fixed. This correlation coefficient is 
called the partial correlation coefficient and is denoted by ^ 2 . 34 ...^; 
hence 


(13) 


7*12*34*- A — 


R\2 

*\/ R11R22 


Although the assumptions made here may seem rather strong, the 
geometry of the situation indicates that the formula represents a com¬ 
promise or averaging of the actual situation. Experience also indicates 
that the formula is highly useful for measuring what it claims to 
measure. 

As an illustration of the use of (13), consider once more the data 
following (7). Calculations give 


7*12 3 = 

-0.58 

= 0.76 

V (0.69) (0.84) 

7*13*2 == 

-0.05 

= 0.10 

V (0.(59) (0.30) 

7*23-1 ~ 

0.24 

= -0.44 

y/ (0.84) (0.36) 


The most interesting of these values is that for r J3 . 2 = 0.10. A com¬ 
parison of this value with r 13 = —0.40 shows that the latter value is 
highly deceptive. Without a knowledge of partial correlations, one 
might be tempted to claim that cold weather is beneficial to hay yield. 
The partial correlation of r 13 . 2 = 0.10 shows that the converse seems 
to be true. If temperature and rainfall were independent rather than 
negatively correlated, this seeming paradox would not have occurred. 
In making statements about the relationship between two variables, 
it is important to make clear whether it is one that permits the influ¬ 
ence of other closely related variables or whether it is one in which the 
influence of certain of those related variables has been eliminated. 
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LINEAR DISCRIMINANT FUNCTIONS 

A problem that arises quite often in science is to discriminate between 
two groups of individuals or objects on the basis of several properties 
of those individuals or objects. For example, a botanist might wish 
to classify a set of plants, some of which belong to one species and 
the rest to a second species, into their proper species by means of 
three or four measurements taken on each plant. If the two species 
were fairly similar with respect to all these measurements, it might 
not be possible to classify the plants correctly by means of any one 
measurement because of a fairly large amount of overlap in the distribu¬ 
tions of this measurement for the two species; however, it might be 
possible to find a linear combination of these various measurements 
whose distributions for the two species would possess very little over¬ 
lap. This linear combination could then be used to yield a type of 
index number by means of which plants of those two species could be 
differentiated with a high percentage of success. The procedure for 
discriminating would consist in finding a critical value of the index 
such that any plant whose index value fell below the critical value 
would be classified as belonging to one species, otherwise to the other 
species. 

The principal difference between a linear discriminant function and 
an ordinary linear regression function arises from the nature of the 
dependent variable. A linear regression function uses values of the 
dependent variable to determine a linear function that will estimate 
the values of the dependent variable, whereas the discriminant func¬ 
tion possesses no such values or variable but uses instead a two-way 
classification of the data to determine the linear function. 

Consider a set of Jc variables, X\ y x 2 , •••, Xk y by means of which it is 
desired to discriminate between two groups of individuals. Let 

(14) z = XiX’i + X 2 #2 + • • * “T X&Xfc 

represent a linear combination of these variables. The problem then 
is to determine the X's by means of some criterion that will enable z 
to serve as an index for differentiating between members of the two 
groups. For the purpose of simplifying the geometrical discussion of 
the problem, consider two variables with rq and n 2 individuals, respec¬ 
tively, in the two groups. The equation 

Z = X1Z1 + X 2 £ 2 

then represents a plane in three dimensions passing through the origin 
and having direction numbers Xi, X 2 , and —1. If the two sets of points 
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corresponding to the values of x\ and x 2 for the two groups of indi¬ 
viduals can be separated by means of a plane through the origin as in 
Fig. 2, it is clear that the values of z corresponding to the two groups 
will assume increasingly divergent negative and positive values as the 
separating plane approaches perpendicularity to the x\, x 2 plane. At 
the same time, however, the variation in the values of z within a group 
becomes increasingly large for both groups; consequently the increase 
in the separation of the values of z for the two groups occurs at the 



\ / 

V 

Fig. 2. Example of a discriminating plane. 

expense of an increase in the separation of the values of z within each 
group. This situation corresponds to that in which the means of two 
distributions are separating but for which the standard deviations are 
increasing to such an extent that greater discrimination between the 
two distributions does not necessarily result. It would be desirable, 
therefore, to choose the plane that separates the values of z for the two 
groups as widely as possible relative to the variation of the values of z 
within the two groups. As a measure of the separation of the two 
groups, it is convenient to use (2i — z 2 ) 2 , where 2i and z 2 are the means 
of the two groups. As a measure of the variation of the values of z 
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2 n t 

within the two groups, it is convenient to use ^2( z %j ~ 2*) 2 

; = 1 

Then the desired plane will be that plane for which the X's are deter¬ 
mined to maximize the function 


(15) 


G = 


2 



(Zi - g 2 ) 2 

- ^o 2 

j==i 


Although the arguments leading to (15) were elucidated by means 
of two variables and three-dimensional geometry, they hold equally well 
for k variables; consequently the solution of the problem will be 
carried out for the general case. 

Let x piJ represent the value of x p for the jth individual in the tth 
group, and let x pi represent the mean value of x p for the n x individuals 
in that group. Then from (14) it follows that 


(16) z x - z 2 = *i(x n - x l2 ) 4-b X k (x k i - x k2 ) y 


and 


(17) Z tJ Z h Xi(Xi^ X\ i) + • * * + ^kfakij Xki) 

If d p = x p i — x p2 y it follows from (16) that 

0 h — z 2 ) 2 = (Xi^i + * * ■ + X fc <4) 2 


k k 

“ X.X Xv hphqdpdq 

P=* 1 q » 1 

2 rii 

If S pq ^ — Xpi)(x giJ — a;^), it follows from (17) that 


- ^) 2 "E Z fti(*i*j - «it) H-h Afc(zjfci, - ^fci)] 2 

l j = i j = l 

2 Kj ft ft 

~ yi xi xz x^x^^p^ x pl )(x gtJ ^i) 

i=i j=i ?>=i y—i 
ft ft 2 n t 

^ X v XX XpX^y ^ X-X^PO Xpi)(.Xqij #gi) 

p* 1 g = l i = l ; = 1 

k k 

= XX Xa ^p\S pq 

p -1 3=1 
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When these values are inserted in (15), it reduces to 


(18) 


EE Xp X q 


G = 


P=li = 1 


iS HLz ^P^qSpq 

P=1 3 =1 


£ 


Since the X\s are to be determined to make G a maximum, it is neces- 
dG 

sary that — — 0 for r = 1, * • •, k. This requirement may be ex- 
d\ r 

pressed in the form 


dG 

d\ r 


n dA A SB 

B -.4 — 

dX r dX r 


B 2 


= 0 (r = 1, •••,*) 


which is equivalent to 

dB 
(19) 


1 dA 


(r = 1, • * k) 


d\ r G d\ r 

For ease of differentiating, it is convenient to write out B in the form 
B — XiXiiSn + • * • + \l\ r Sir + * • • + XiX^Sifc + 


\rX\Sri + • • • + \rX r Srr + * * * + X r \kS r k + 


XfcXi Ski + * * ‘ + XkXrSkr + * * * + XfcXfcSfcfc 

It will be observed that X r occurs as a common factor of both the rth 
row and the rtli column. Since S tJ — S Jt , it therefore follows that 

dB 

—- = 2[XiASri + * * * + XjfcS r fc] 

OA r 

Similarly, 

dA 

= 2[Xxd r <ii + * • • + Xfc(i r dfc] 
cX r 


= 2[Xidj + • • • + Xfc(4]d, 
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If these expressions are inserted in (19), it will reduce to 

(20) Xi>S r i + X 2 $ r2 + ' —b ^kSrk = cd r (r = 1, • * •, k) 

where c = [\\di H-f- \kdk]/G is independent of r. 

Since 

2 n t 

(21) v 

l “ 1 7 *» 1 

and 

(22) dp — 3/p2 

are numerical quantities in any given problem, the necessary conditions 
(20) constitute a set of k linear equations in the X's. The solution 
of these equations determines the X's except for the unknown factor c. 
From (14) it is clear that such a factor can be ignored because the two 
sets of z’s would merely differ by this constant factor and thus would 
be equivalent as far as discriminating between the two groups is con¬ 
cerned. As a matter of fact, it is usually convenient to choose c = 1, 
solve the equations, and then reduce (14) to the form in which one 
of the X’s, say X x , is unity. 

As an illustration of the use of this function, consider the data of 
Table 1 on the mean numbers of teeth found on the proximal (xi) and 
distal (z 2 ) combs of two races of insects. The problem here is to 
discriminate between members of the two races by means of the two 
indicated variables. 

TABLE 1 


Race A 

' 

T\ \ 

0 36 

5.02 

5 02 

6 44 

6 40 

6 56 

6.64 

6.68 

6 72 

6.76 

6.72 


X<l 

5. 24 

5 12 

5 30 

5 64 

5.16 

5 56 

5 36 

4.96 

5 48 

... i 

5.60 

5.08 


Race U 

Xl 

6 00 

5.60 

5 64 

5 76 

5 96 

5 72 

5 64 

5 44 

5 04 

4 56 

5.48 

5.76 

X2 

4 88 

_. 

4 64 

4 96 

4 80 

5 08 

5 04 

4 96 1 

5 

4.88 

4.44 

4.04 

4.20 

4.80 


Computations give = 2.G8, S\ 2 = 1.29, S 2 2 = 1.75, d\ = 0.915, 
d 2 = 0.597; consequently if c is chosen equal to 1, equations (20) 
become 

2.68Xi + 1.29X 2 = 0.915 
1.29Xi + 1.75X 2 = 0.597 

The solution of these equations is X x = 0.274 and X 2 = 0.139. If these 
values are used, the linear discriminant function (14) becomes 

2 = 0.274a:! + 0.139a:2 

For the purpose of computing values of 2 , it is more convenient to 
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choose c so that either Xi or X 2 equals 1. If c is chosen to make Xi 
equal 1, this discriminant function reduces to 

z = x\ + 0.507x2 


The values of z corresponding to the various members of the two races 
given in Table 1 are as follows: 


Race A 

9.02 

8.52 

8 64 

9.30 

9.02 

9.38 

9.36 

9.19 

9.50 

9 60 

9.30 


Race B 

8.47 

7.95 

8.15 

8.19 

8.54 

8.28 

8.15 

7.91 

7.29 

6.61 

7.61 

8.19 


It will be noted that the two races are segregated by means of z except 
for the slight overlap found in the second entry for Race A and the fifth 
entry for Race B. 
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EXERCISES 


1. The following values represent sample values for 450 college students in 
which the three variables represent honor points, general intelligence scores, and 
hours of study, (a) Find the regression equation for estimating honor points. 
( b ) By means of the multiple correlation coefficient, determine whether this re¬ 
gression equation would enable one to predict grade point averages with a fair de¬ 
gree of success, (c) Find all three partial correlations, and interpret them in the 
light of the corresponding simple correlations. 


Xi - 

18.5, 

X 2 = 

100.6, 

X 3 - 

si = 

11.2, 

82 = 

15.8, 

S3 = 

ri2 = 

0.60, 

ns — 

0.32, 

r 23 = 


2. The following correlation coefficients are for the variables: average grade 
the first semester at college, arithmetic test score, average grade in high-school 
work, and student interest breadth. Find and interpret those quantities which 
would bo of particular interest. 


1 0.465 

0.546 

0.365 

1 

0.401 

0.197 


1 

0 345 



1 


3. Show that n 2 .3 


_ri 2 — risr 2 3 

V(1 - ri3 2 )(l - r 2 3 2 ) 


4. Using problem 3, find limits for the possible values of r 2 3 in each case if 
and ri 3 have the following pairs of values: 


ri2 — 0, +1, +1, +0.5, +0 5 
ri3 = 0, +1, —1, +0.5, —0.5 


6 . If the relation aX 1 + bX 2 + cX 3 = 0 holds for all values of Xi, X 2 , and 
X 3 , what are the values of the partial correlations? 

6 . Is there anything in the derivation of the formulas of this chapter which 
would not permit the variables to be X\ = y. X 2 — t } X 3 = t 2 y etc., and hence to 
yield formulas for polynomial curve fitting? 

7. Show that the simple correlation coefficient between Xi and X\ is the mul- 
tiple correlation coefficient. Use the same type of technique as that employed in 
deriving the formula for s 2 . 

8. For the case of three variables, show that the simple correlation coefficient 
between the deviations X\ — X\ and X 2 — X 2 ', in which the prime indicates the 
regression value on the variable X 3 , is the partial correlation coefficient 7 * 12 . 3 . 

9. Classify the first 16 individuals in problem 3, Chapter V, into one of two 

groups on the basis of having a G.P.A. less than or greater than 0.9, (a) Using 

the remaining variables, find the equation of the discriminant function for segre¬ 
gating individuals into the proper G.P.A. group. (5) Calculate the values of z 
for the selected individuals, and note whether the discriminant function discrimi¬ 
nates noticeably better than either variable alone. 

10 . For the case of three variables show that Rn^ R ^ 0. 
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EXPECTED VALUES 


By the expected value of a function of a statistical variable is meant 
its mean value. The expression arose from its connection with the 
amount of money one could expect to win at a game of chance. If the 
expected value is denoted by E, then by (7), Chapter III, the expected 
value of a function g(x) is 


E[g(x)] = f 

J a 


b 

g(x)f(x) dx 


where f(x) is the distribution function of x. For a discrete variable 
this integral would be replaced by a sum. It is clear from a direct 
application of this definition that 

(1) E[cg(x)] = cE\g(x)] 
where c is a constant, and that 

(2) E[ gi (x) + g 2 (x)] = E[ 9l (x)] + E[g 2 (x)] 

where g\(x) and <72 0*0 are any two functions of x possessing expected 
values. 


1. Unbiased Estimate of cr 2 Based on One Sample Variance 

Consider the expected value of a sample variance based upon a 
sample of size n. From properties (1) and (2) and the definition of 
<r 2 , it follows that 

F[s 2 ] - E[ - 2(3 - x) 2 ] 
n 

== E[ — 2 {(x — m) — (x — m )} 2 ] 
n 

- -2 E[(x - m) 2 ] - E[(x - m) 2 ] 
n 
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(3) 


1 

n 


-IV 2 


02 


2 


= <r 2 


cr 


2 


n 



n 


This shows that if repeated samples of size n are taken and if the result¬ 
ing sample variances are averaged, the average will not approach the 
true variance in value but will be consistently too small by the factor 
of (n — \)/n. For small samples this factor becomes important; 
consequently one must be careful how he combines samples in making 
an estimate of the true variance. In order to overcome this defect of s 2 
as an estimate of <r 2 , it is merely necessary to multiply s 2 by n/(n — 1) 
and use the resulting quantity as the estimate of c 2 . Then 


E 



n 


n — 1 


E[s‘ 


2 ] = 


2 


When the expected value of a statistic is equal to the population 
parameter of which it is intended as an estimate, the statistic is called 
an unbiased estimate of the parameter. It is clear that s 2 is biased, 

?i 

whereas - s 2 is unbiased. In this chapter an unbiased estimate 

n 1 

will be indicated by placing a circumflex over the parameter of which 
it is an unbiased estimate. Therefore, 


( 4 ) 



SCr - z) 2 

n — 1 


Thus, it appears that one can avoid the bias in estimating variances 
by dividing the sum of squares of deviations by n — 1 rather than 
by n as was the practice with large samples. It is because of this 
property that some authors define the sample variance, s 2 , to be 
2 (re — x) 2 /(n -—1), It should be remarked that a 2 is not the same 
as (a) 2 , which denotes the square of an unbiased estimate of <r. 

2. Unbiased Estimate of cr 2 Based on Several Sample Variances 

It often happens that several sample variances are available for 
estimating the population variance. In order to obtain an average 
value from these sample values that will make allowance for differ¬ 
ences in precision due to differences in'sample sizes, a weighted average 
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of some kind is necessary. Furthermore, it is important with small 
samples to select a weighting that is free of bias. Although various 
weightings can be designed which lack bias, the simplest scheme would 
seem to be to weight each variance with the size of the sample on which 
it is based and then multiply by a factor that will make the result 
unbiased. If k sample variances are available, such a choice leads to 

. N i 2 + n 2 s 2 2 H— • + njcSk 2 

(5) <r 2 =-* 

U\ -f- ti 2 -f- * • • -f- Uk — k 

From (3) it follows that E[n z s t 2 ] — (n z — l)cr 2 ; consequently, if prop¬ 
erties (1) and (2) are applied to (5), it will be found that E[o 2 ] = a 2 
and therefore that this estimate is unbiased as indicated. 

Formula (5) is particular]}" useful when several sets of data are 
available for which the variability can be assumed to be constant but 
for which the means vary more than could be reasonably attributed to 
chance. If such sets of data were combined into one set and the vari¬ 
ance of this entire set used as an estimate of the population variance, 
the result would be to overestimate the value of a 2 . However, if one 
calculates the variance of each set and combines as in (5), the result 
will be unbiased. 


CONFIDENCE LIMITS 

The problems that were considered in the chapter on large-sample 
theory were largely of the hypothesis-testing type. For example, a 
sample mean was tested to see whether it might reasonably have come 
from a population with a specified mean, or the difference between two 
sample means was tested to see whether these means might reasonably 
have come from two populations with the same means. However, a 
problem that arises just as frequently is the estimating of a population 
parameter, such as the mean or variance. Not only an estimate is 
desired, but also limits within which one can have confidence that the 
population parameter lies. For the purpose of studying this type of 
problem, consider a numerical illustration. 

Suppose that a random sample of size 100 has been taken from a 
population that is known to be normal and whose variance is known to 
be 16. Suppose further that the mean of this sample is 30. Then the 
problem is to estimate the population mean, m, by means of an interval 
of values of x. Since a 2 = 16, cr^ = cr/y/n = 0.4. Although the value 
of m is unknown, it is known from large-sample theory that, for 
repeated samples of the type being considered, x will be normally 
distributed about this value of m with a standard deviation of 0.4; 
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consequently the fixed interval m =b 0.8 will contain 95% of such sam¬ 
ple means on the average. Since rri is unknown, one would be tempted 
to replace m by x, whose value is known to be 30 for this first sample 
of 100, and then make the same probability statement as before. 
However, it is clear that x will change from one sampling experiment 
to the next and that conceivably the first value of x , namely 30, might 

m + . 8 
m 

m —.8 



Fig. 1. Illustration of confidence interval methods. 

be a very poor one for estimating m, so that such probability statements 
not only would be false but might give highly incorrect results. Cor¬ 
rect probability statements can be made here in the following manner. 
If the interval x =b 0.8 is treated as a variable interval, changing with 
each sample of 100, then, in repeated sampling, 95% of such intervals 
on the average will contain m. This follows from the fact that, if 95% 
of sample means lie within 0.8 of m , in 95% of such samples m must 
lie within 0.8 of the corresponding x. The situation is represented 
geometrically in Fig. 1. Each point represents an x based on a sample 
of 100. The upper diagram corresponds to the case in which m is 
assumed known and a probability statement is made concerning x’s. 
The lower diagram corresponds to the case in which m is unknown and 
the variable intervals x =b 0.8 are plotted. If a point lies inside the 
95 % band of the upper diagram, its interval in the lower diagram must 
cover m. 

Now in practice only one such x is available, so that only the first 
point and its corresponding interval of 30 db 0.8 is available. On the 
basis of this one experiment, the claim will be made that the interval 
30 db 0.8 contains the population mean m. If for each such experi¬ 
ment one made the same claim for the interval corresponding to that 
experiment, then, in repeated experiments, 95% of such claims would 
be true on the average. It is in this sense that correct probability 
statements may be made concerning population parameters. The 
interval 30 zb 0.8 is called a 95% confidence interval for m. If one uses 
confidence intervals as above on all estimation problems that arise, on 
the average 95% of such confidence intervals will contain the parame- 
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ters claimed for them. Confidence intervals enable one to obtain a 
useful type of information about population parameters without the 
necessity of treating such parameters as statistical variables. It should 
be clearly understood that one is merely betting on the correctness of 
the rule of procedure when applying the confidence-interval technique 
to a given experiment. It will be observed in the following sections of 
this chapter that this technique may be applied to various familiar 
population parameters such as the variance and the regression slope. 

DISTRIBUTION OF A FUNCTION OF A VARIABLE 

In the following sections a great deal of use will be made of a change 
of variable in distribution functions. If x denotes the original variable 
and y the new variable, this change of variable may be represented 
by x = g(y ). Since probabilities are given by integrals, the procedure 
for making a change of variable for distribution functions is precisely 
that for integrals. Thus, in making this change of variable in the 
following integral, it is merely necessary to evaluate dx = g\y) dy and 
obtain 

(6) f f(x) dx = f f(x)g'(y) dy 

Jxi J 2/1 

where the value of x on the right is replaced by x — g(y) and the 
values of y\ and y 2 are the values of y corresponding to x\ and x 2 
for x. Now it is necessary to restrict the function g(y) to be monotonic, 
that is, to be a function that either never decreases or never increases; 
otherwise f(x)g / (y) would change signs and therefore could not serve 
as a distribution function. It is assumed here that g{y) has a continu¬ 
ous derivative. If g(y) were not monotonic, there would also be the 
difficulty of having more than one value of y correspond to some values 
of x. 


x i *2 y x y 2 

Fig. 2. Illustration of a change of variable for a distribution function. 

The geometry of a typical change of variable is indicated in Fig. 2, 
in which the first graph is that of fix) and the second graph is that of 
J(x)g'(y)- The two shaded areas correspond to the left and right inte¬ 
grals in (6). In these graphs g(y) is monotonic increasing because 
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increasing values of x are made to correspond to increasing values of 
y. Since the integral on the left side of (6) yields the P[x x < x < x%], 
the integral on the right side must do likewise. Because of the mono¬ 
tonic property of g(y), which insures in this case that y increases with 
x, the probability that y will lie in a given interval must equal the 
probability that x will lie in its corresponding interval. The monotonic 
property sets up a correspondence between intervals on the x and y 
axes of Fig. 2 such that y will lie in an interval (y Xj y 2 ) if and only if 
x lies in its corresponding interval {x x , x 2 ). Thus, for all intervals 

(# 1 , X2) , 

P[x 1 < X < x 2 ] = p[yi < y < y 2 ] 

The integral on the right side of (6) must therefore yield the P[y x 
< y < 2 / 2 ]? and its integrand must be the distribution function of y. 
If g(y) had been monotonic decreasing, it would have been necessary 
to take the negative value of the integrand before it could have been 
treated as a distribution function. In either case the formula 

(7) f(y) = /(*) | g'(y) \ 

with x replaced by g(y), yields the desired distribution function of y. 
Although the / notation being used here was explained in Chapter III, 
it may be worth repeating that /(:r) aml f(y) denote the respective 
distribution functions of x and y and that these functions will be differ¬ 
ent unless g(y) = y. Although (7) may appear to be new, it represents 
in formula fashion the new integrand which the student of elementary 



integral calculus obtains automatically when performing an integration 
by means of a substitution to a new variable. 

As an illustration, consider the problem of finding the distribution 
function of y if x = y 2 and f(x) — e~ x , x ^ 0. Since g(y) = y 2 , 
g'{y) = 2 y, and therefore by (7) 

f(y) = 2 ye-“ 2 

The relationship between these two distribution functions is shown 
geometrically in Fig. 3. It should be noted that only positive values 
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of y are considered here; otherwise there would be two values of y 
to each value of x and g(y) would not be monotonic. 


THE x 2 DISTRIBUTION 


1. Moment-Generating Function of X 2 


One of the most widely used continuous distribution functions in 
statistical work is the x 2 function. This function received its name in 
connection with early work by K. Pearson on the problem of measuring 
the goodness of fit of frequency curves. Its application to such prob¬ 
lems will be treated in Chapter X. The x 2 function arises in this chapter 
in connection with the problem of finding the distribution function of 
s 2 . This function is defined in terms of the variable x 2 by 


( 8 ) 


fix 2 ) 



v—2 

(x 2 ) 2 e‘ 2 


in which the parameter v is called the number of degrees of freedom and 
T represents the gamma function defined in the section on moments in 
Chapter III. A sketch of (8) for several values of v is given in Fig. 4. 



Fig. 4. Distribution of x 2 for various degrees of freedom. 


The moment-generating function of x 2 will be needed in the next 
three sections; therefore its derivation will be considered next. For 
convenience of notation let v = x 2 ; then the moment-generating func¬ 
tion of v will be given by 

M v (6) = j* e Bv f(v ) dv 
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~r4r /■>-.' 

2 2r © J ° 


'—2 __r 

2 e 2 dv 


-5<l-2») 5-1 


ir 2 dv 


v 2 

Let 2 / = - (1 — 20); then dv = -—- dy and 

2 ] — 20 






2 1 2 


2 2 r 

0 - 2e) 2 


1 - 26 


dy 


^ rw- 1 * 

M Jo 


■© 


Since the integral on the right is that for T , the moment-generating 
function of x 2 for v degrees of freedom is given by 


V 

(9) M x *(d) = (1 - 20) 

2. Distribution of a Sum of Squares 

Let x be normally distributed with zero mean and unit variance. 
Suppose that a random sample of size n has been drawn from this 
population. These sample values will be denoted by x\ } x 2 , • • x n . 
The object of this section is to find the distribution function of w y 
where 

n 

(10) w =y ~/, 2 

The derivation of f(w) will be accomplished by means of its moment¬ 
generating function. 

Since the sampling is random, the x t are independent and possess 
the same distribution function; hence 

M w (0) = M XI 2 + . .+ Xn 2(0) 

= M Xl ^{6) M ^ 2 ( 0 ) • • -M^iO) 

= M^ n {6) 


( 11 ) 
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Now x is a standard normal variable; therefore 

1 /»oo 

M x 2 ( 0 ) = — 7 = I e 6x -e 2 dx 

V 27T 

, • r.-J'-’d, 

v 2 tt •/ — « 

Let y = a:V 1 — 20; this integral reduces to 

MAS) = (1 - 2 »)-« -L= f e ~2 

A/27t 

= (1 - 26)~ Vi 

From this result and (11) it therefore follows that 

n 

( 12 ) M w (0) = (1 - 20 ) 2 

A comparison of this result with formula (9) will show that w has the 
moment-generating function of x 2 with n degrees of freedom. Since a 
distribution function is uniquely determined by its moment-generating 
function, the preceding derivation proves the following theorem. 

Theorem I. If x is normally distributed with zero mean and unit 
variance , the sum of the squares of n random sample values of x has a x 2 
distribution with n degrees of freedom. 

3. Distribution of s 2 

The theorem that was just demonstrated is the basis for deriving 
the distribution functions of many useful statistical variables. In 
particular, it can be used to derive the distribution function of s 2 
when the basic variable is normally distributed. To this end, let x 
be normally distributed with mean m and variance cr 2 , and let x and 
s 2 be the sample values of these parameters based upon a random 
sample of size n. 

Now it can be shown, but only with considerable difficulty, that x 
and s 2 are independently distributed when the basic variable is nor¬ 
mally distributed. This fact will be assumed here. Consider 

s 2 — - 2(x — x) 2 
n 

— - 2[(x — m) — (x — m)] 2 
n 

1 

= - 2(x — m)~ — (x — m) 2 
n 
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Because of the convenience of working with standard units, this rela¬ 
tionship will be multiplied through by n/o 2 and then written in the form 


or symbolically as 




J +K = L 


If the moment-generating function of both sides is taken, 

(13) M j+k (6) = M l (0) 

Because of the previously mentioned independence of x and s 2 , it 
follows that J and K are independently distributed. Although this 
seems clear from the meaning of independence, it can easily be demon¬ 
strated by showing that the distribution function of J and K is the 
product of its marginal distribution functions, under the assumption 
that the distribution function of x and a 2 is the product of its marginal 
distribution functions. Because of this independence, the function on 
the left side of (13) can be factored; therefore 

Mj{0)M k (6) = M l (6) 


Since L is the sum of squares of n sample values of a normal variable 
with zero mean and unit variance, it has the properties of w in (10); 
consequently it has the same moment-generating function as w. 
Therefore, 

Mj{d)M K (0) = M w (6) 


Since .s* 2 is the variable of interest here, this equation will be written 
in the form 


Mj(0) = 


M w {6) 

M k (6) 


Both the functions on the right can be evaluated by means of (12). 
x — m 


Since 


is a normal variable with zero mean and unit variance, 


a/-y/n 

its square can be treated as a special case of w in (10) for which n = 1. 
It therefore follows from (12) that 


A7 ns 2 ($) — 

<r 2 


(1 - 20 ) s 
(1 - 20 ) 


_n— 1 

(1 - 26) ~ 2 ” 


Since a distribution function is uniquely determined by its moment¬ 
generating function, this result together with (9) proves the following 
theorem. 
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Theorem II. If x is normally distributed with variance a 2 and s 2 is 
the sample variance based on a random sample of size n, then ns 2 /a 2 has 
a x 2 distribution with n — 1 degrees of freedom. 

Two applications of this important theorem will be considered 
immediately after the next section. 

4. Additive Nature of X 2 

Situations may arise in which an experimenter has several sets of 
data that he would like to combine into one experimental result or 
conclusion. Such a situation was indicated in section 2 with regard 
to the combining of several sample variances into one good estimate of 
a 2 . If the statistics that are to be added possess independent x 2 
distributions, the distribution of the sum can be found as follows. 

Let xi 2 and X 2 2 possess independent x 2 distributions with v x and v 2 
degrees of freedom, respectively. Consider the variable z = xi 2 + X2 2 - 
From moment-generating functions and (9), it follows that 

M z (6) = M X1 2(0)M X2 2(6) 

_v\ __ j'2 

= (1 - 26) 2(1 - 26) ^ 

= (l - 2 e) " 2 

This result demonstrates the following theorem. 

Theorem III. If xi 2 and X 2 2 possess independent x 2 distributions with 
v\ and v 2 degrees of freedom, respectively , then xi 2 + X2 2 w ill possess a 
X 2 distribution with vi + degrees of freedom. 

APPLICATIONS OF THE x 2 DISTRIBUTION 
1. Confidence Limits for <r 2 

Let x be normally distributed with variance a 2 , and let s 2 be the 
sample variance based on a random sample of size n. Then 95% con¬ 
fidence limits for a 2 may be obtained in the following manner. 

From Table III for n — 1 degrees of freedom find two values of x 2 , 
namely xi 2 and X 2 2 > such that the probability is 0.975 that x 2 > Xi 2 
and such that the probability is 0.025 that x 2 > X2 2 - Then it follows 
from Theorem II that the probability is 0.95 that 

2 ns2 2 

X1 2 < — < X2 2 

< 7 “ 



or that 
( 14 ) 
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ns 


X2 


< a 2 



These two numbers constitute 95% confidence limits for a 2 . From the 
discussion in the section on confidence limits, it follows that on the 
average 95% of the inequalities of this type that are computed will be 
true inequalities. This method, of course, is not restricted to 95% 
limits. 

As an application, consider the data of problem 10, Chapter IV. 
Suppose that an estimate of the variance is desired on the basis of the 
first 5 sets of data. If these 25 observations are combined, it will be 
found that 

ns 2 = 2 (a; — x) 2 — 9,715 


A direct application of (14) and Table III will show that 96% con¬ 
fidence limits for a 2 are given by 


which is equivalent to 


9,715 0 9,715 

40.27 11.99 

241 < (r 2 < 809 


It is clear that <r 2 cannot be estimated with much accuracy for such 
a small sample and such variable data. 

As a second illustration, consider the problem of finding confidence 
limits for a 2 when several sets of data are available for which the means 
differ considerably but for which the variances are expected to be 
homogeneous. The data of the first illustration might have been of 
this type if some change had been made in the manufacturing process 
that would have raised the mean quality of the product without affect¬ 
ing the variability of the quality. The problem here is closely related 
to the problem considered in section 2. As in that problem, the variance 
of the combined data would be seriously in error if the means differed 
considerably. The proper approach in such a situation will be illus¬ 
trated on the sets of data just considered, even though these sets of 
data may legitimately be combined as in the first illustration. If each 
set of 5 measurements is treated separately, it will be found that 

riiSx 2 = 1,185, n 4 s 4 2 = 1,478 

n 2 s 2 2 = 1,599, n 5 s 5 2 = 705 

n 3 s 3 2 = 4,214, 2 n t s 2 = 9,181 
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If it is assumed that each set has the same population variance, a 2 , 
even though they may have different means here, then the n^/o 2 
will have independent x 2 distributions with 4 degrees of freedom each. 
From Theorem III it follows that 2 n 1 s t 2 /<x 2 will have a x 2 distribution 
with 20 degrees of freedom. Therefore, by (14) and Table III, 96% 
confidence limits will be given by 

9,181 0 9,181 

- < cr 2 <- 

35.02 9.237 

or 

262 < a 2 < 994 

It will be noted that these confidence limits do not differ appreciably 
from those found by the lirst method. In both these illustrations it is 
assumed that the basic variables are normally distributed. 


STUDENT’S t DISTRIBUTION 

Consider the data of Table 1 on the additional hours of sleep gained 
by 10 patients in an experiment with a certain drug. The problem is 
to determine whether these data justify the claim that this drug does 
produce additional sleep. 

TABLE 1 


Patient 

1 

2 

3 

4 

5 

6 

7 

8 

0 

Hours 

gained 

0.7 

-1.1 

-0 2 

+ 1.2 

+0.1 

3.4 

3.7 

0 8 

1.8 


Assume that the hours of additional sleep is a normally distributed 
variable, and set up the hypothesis that the population mean is zero. 
Furthermore, assume that these 10 patients may be treated as a 
random sample of size 10 from this population. 

If this problem were treated in the traditional large-sample manner 
of Chapter IV, the experimenter would use the data of Table 1 to 
obtain 

x = 1.24 and s = 1.45 
Then he would calculate 

x — m x — 0 . 1.24VTo 
cr/y/n 


T = 


1.45 


= 2.70 
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From Table II the probability of obtaining a value of r > 2.70 is 
0.0035; consequently the hypothesis that m = 0 would be rejected here 
even at the 1% significance level. If possible psychological factors 
were under control here, it appears that the drug does increase sleep. 

This large-sample method is subject to one serious objection. For a 
sample as small as this, the sample standard deviation, s, will not be 
an accurate estimate of a ; consequently a serious error may be intro¬ 
duced in the value of r in replacing a by its sample value. In most 
applied problems the true standard deviation is unknown. In order 
to overcome this defect in the test, it is necessary to consider a new 
variable which involves the sample standard deviation rather than the 
population standard deviation. Such a consideration will lead to what 
is known as Student/s t distribution. 

To this end, let u be normally distributed with zero mean and unit 
variance, let v 2 have a x 2 distribution with v degrees of freedom, and 
let u and v be independently distributed. Furthermore, let c denote 
any and all constants whose specific values are of no interest. This last 
notational device eliminates the necessity for writing out the explicit 
form of numerous constants in the following derivations. 

In the derivation of the t distribution it is necessary to obtain the 
distribution function of the variable v. Since v 2 = x 2 > the distribution 
function of v ^ 0 can be obtained from that of x 2 by considering the 
change of variable from x 2 to v through the relation x 2 — v 2 and apply¬ 
ing formula (7). Here x 2 corresponds to x, v corresponds to y y and 
g(v) = v 2 ; consequently 

f(v) = fix 2 ) 2v 

If the value of fix 2 ) from (8) is inserted and x 2 is replaced by v 2 , the 
distribution function of v reduces to 

_* 2 

fiv) = af-'e 2 

Since u and v are independently distributed and u is a standard normal 
variable, 

fiu, v) = fiu)f{v) 

u 2 v 2 

= ce 2 cv v ~ 1 e 2 


Now let t = u\fv/v represent a change of variable from u to t with 
v held fixed. Such a procedure is equivalent to making a change of 
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variable in the first of the two integrations of a double integral. Since 

v 

du == — 7 = dt here, it follows from (7) that 

V v 


fit, V ) = f(u, V ) 


Vi 


= cv v e 


= cv v e 




The purpose of the present section is to find the distribution function 
of t. It will be recalled from ( 6 ), Chapter VI, that this can be accom¬ 
plished by integrating out the variable v from the joint distribution 
function of t and v; hence 




H-l 


since the last integral is merely a constant as far as the variable t 
is concerned. The preceding derivation proves the following theorem. 

Theorem IV. If u is normally distributed with zero mean and unit 
variance and v 2 has a x 2 distribution with v degrees of freedom, and if u 
and v are independently distributed, then the variable 
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has Student 1 s t distribution with v degrees of freedom given by 

/(0 = c (i + “) 

Now consider once more the problem that was introduced at the 
beginning of this section in order to see how this theorem can remedy 
the defect in the large-sample method of solution. Since x is normally 
distributed with zero mean, the variable 

x xVn 

°X (7 

possesses the properties of u in Theorem IV. From Theorem II it 
follows that 



possesses the properties of v 2 in Theorem IV with v = n — 1. Since 
it is known that x and s 2 are independently distributed, Theorem IV 
may be applied to give 

xVn — 1 1.24V / 9 

t = -=- = 2.57, r = 9 

s 1.45 

From Table IV it will be found that the probability is approximately 
0.017 of obtaining a value of t > 2.57. This result is also significant 
at the 5% significance level. 

A comparison of this probability of P = 0.017 with that of P 
= 0.0035 obtained by the use of large-sample methods shows that 
the large-sample method is not accurate for a sample as small as 10 . 
It will be found that the large-sample method gives probabilities 
that are consistently too small; consequently large-sample methods 
will claim significant results more often than is justified. The explana¬ 
tion for this bias on the part of large-sample methods is that the t 
distribution has a slightly larger dispersion than the standard normal 
distribution. The situation is shown graphically in Fig. 5, which gives 
the graphs of the standard normal distribution and Student/s t distribu¬ 
tion for 4 degrees of freedom. 

The important feature of the t distribution is that it does not depend 
on any unknown population parameters and hence there is no necessity 
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for replacing parameter values by questionable sample estimates as 
there is in the large-sample normal curve method. 

The inaccuracy of the large-samp]e method could have been reduced 
somewhat by using the unbiased estimate of a 2 ; however, for samples 
as small as 10 the error would still be considerable. 



Fig. 5. Standard normal and Student’s t distributions. 


APPLICATIONS OF THE t DISTRIBUTION 
1. Confidence Limits for a Mean 

Let x be normally distributed with mean m and variance cr 2 . Let 
x and s 2 be their sample values based on a random sample of size n. 
Then 

x — m 


u = 


and 


r/ Vn 


ns 


satisfy the requirements of u and v in Theorem IV; consequently 

(x — m)V / n 


(15) 


t = 


has a t distribution with n — 1 degrees of freedom. If < 0.05 represents 
the value of t for n — 1 degrees of freedom such that the probability is 
0.05 that | 1 1 > /o.o 5 > then the probability is 0.95 that 

(x — m)Vn — 1 
$ 


< to 05 



DIFFERENCE BETWEEN TWO MEANS 
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or that 



s 

< m < x + to or> ~ 

'Vn — 1 


This inequality determines a 95% confidence interval for m. If some 
probability other than 0.95 is desired, it is merely necessary to replace 
i 0 .o 5 by the corresponding value of t from Table IV. 


2. Difference between Two Means 

The t distribution may be used to eliminate the error in large- 
sample methods when testing the difference between two means in the 
same manner as for testing one mean. Let x and y be normally dis¬ 
tributed with means m x and m y and with the same variance, <x 2 . Let 
random samples of size n x and n u be taken from these two populations. 
Denote the sample means and variances by x, y } s x 2 , and s y 2 . Then 

(x - y) - (m x - niy) 
u —- 


(x - y) - (w x - m y ) 



will possess the required properties of u in Theorem IV. Furthermore, 

„ n x s x 2 + riySy 2 

V = -*- 

cr" 


with v — n x + n y — 2 degrees of freedom, is easily seen to possess the 
properties of v 2 in Theorem IV. This follows from Theorem II and 
Theorem III because 


n x SjT . WySy 
----- and — 
<t cr 


possess independent x 2 distributions with n x — 1 and n y — 1 degrees 
of freedom, respectively. Consequently, 


(16) t = 


( X — 'fj') — (jl^x Why) “b ?h/ 2) 


y/‘ W'xSx + 'M'lfiy 


n x + riy 


v = n x + n y 


will have Student’s t distribution with n x + n y — 2 degrees of freedom. 
Then, to test the hypothesis that m x = m y , it is merely necessary to 
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calculate the value of t and use Table IV to see whether the sample 
value of t numerically exceeds the critical value. 

It will be noted that the value of t does not depend upon any popula¬ 
tion parameters as in the large-sample method explained just after 
Theorem III, Chapter IV. It will also be noted, however, that the t 
test is less general than the large-sample method because here it is 
necessary to assume equality of the variances, which was not true for 
the large-sample approach. In a later section the problem of testing 
for the equality of variances will be considered so that this assumption 
can be checked for its reasonableness. A method is available that does 
not require equality of the variances; however, its theory is too ad¬ 
vanced to be considered here. 

Formula (16) may also be used to determine confidence limits for 
m x — m v . If it has been shown that the hypothesis of m x = m y is not 
reasonable, it is of interest to know how large or how small a difference 
is reasonable. For a given probability, confidence limits will give the 
minimum and maximum differences. 

As a numerical illustration, consider the data of Table 2 on the yield 
of corn in bushels on 10 pairs of plots in which plot one of each pair 
received some phosphorus as a fertilizer. 


TABLE 2 


Plot 1 

6.2 

5.7 

6.5 

6.0 

6.3 

5.8 

i 

5.7 

6.0 

6.0 

5.8 

Plot 2 

5.6 

5.9 

5.6 

5.7 

5.8 

5.7 

6.0 

5.5 

5.7 

5.5 


It will be assumed that all pairs of plots were treated alike except 
for the addition of phosphorus to half of them and that the yield of 
com may be treated as a normal variable. If x and y correspond to 
plots 1 and 2, respectively, it will also be assumed that a x = a y . Now 
set up the hypothesis that m x = m y . Calculations here give 


x — 6,0, n x s x 2 = 0.64 
y = 5.7, n y Sy 2 = 0.24 


When (16) is applied, 
t = 


0.3 f 

V0.64 + 0.24 ^ 


100(18) 

20 


= 3.03, ? = 18 


From Table IV the 1% critical value of t = 2.55, only the right tail 
being considered; consequently this result is highly significant, and 
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the hypothesis of no increase in mean yield will be discarded. If the 
assumptions of normality and equality of variances are reasonable so 
that the experimenter can justifiably claim that this significant differ¬ 
ence is due to a real difference in the population means, it becomes of 
interest to know how large a real difference is likely. Confidence limits 
for m x — m y will give the desired information. The same calculations 
as above give 

0.3 — (m x — m y ) 

0.0989 


Then 95% confidence limits are given by 


which reduces to 


0.3 — (m x — niy) 
0.0989 


< 2.101 


0.092 < m x — m y < 0.508 


From this result it is clear that, for a sample as small as 10, one cannot 
promise with any great degree of certainty more than about a 2% 
increase in yield due to the addition of the phosphorus. 


3. Confidence Limits for a Regression Coefficient 

The problem to be considered in this section is determining whether 
the difference between the slopes of a sample and a theoretical regres¬ 
sion line might reasonably be due to sampling variation. Let X and Y 
denote the two variables, and let X* and Y t (i = 1, 2, • * ft) denote 
their sample values for a random sample of size ft. Then the corre¬ 
sponding small letter will be used to represent the variable measured 
from its mean. With this notation, the equation of the least-squares 
regression line as given by (3), Chapter V, is y' = bx , where 

i 

Now let repeated random samples of size n be selected such that 
precisely the same set of X values as the original set is obtained each 
time. This restriction means that the X % are not treated as variables 
once the first sample of n has been taken. As a matter of fact, the 
original set of X’s need not be selected at random. Then it will be 
assumed that the Y t are normally distributed about a true regression 
line, whose equation will be written 

Y f - a + $X 



148 


SMALL-SAMPLE DISTRIBUTIONS 


with the same variance, cr 2 , for all F*. It should be noted that X x (i = 1, 
• • •, n) is fixed but that to each sample of n there corresponds one value 
of Yi, so that Y % is a statistical variable. The assumptions made here 
concerning the variables Y L may seem rather stringent; however, it 
may be recalled that it was shown in Chapter VI that the properties 
assumed here are possessed if X and F are jointly normally distributed. 

For simplicity of notation, let 


(17) 

Then 




Xi 




b = ltw l Y x 


Since the x x are fixed, they may be treated as constants; consequently 
b may be treated as a linear function of the Y x . Now the Y x are 
statistically independent and have the same variances; therefore it is 
readily shown that 

<Jb 2 = = <r 2 2w* 2 

Substitute (17); then 


By means of the methods employed in Chapter IV in the section devoted 
to the distribution of x from a normal distribution, it is easy to show 
that a linear combination of independent normal variables is normally 
distributed. Since the Y x are independent normal variables and b is a 
linear combination of them, 


b - ft 
u =- 

Vb 



VsT , 2 


possesses the properties of u in Theorem IV. Furthermore, it can be 
shown with considerably more difficulty that the measure of variation 


v 


2 



v = n — 2 


where F/ represents the sample linear regression estimate of Y ly 
possesses the properties of v 2 in Theorem IV with n — 2 degrees of 
freedom. This fact will be assumed here. Therefore, by Theorem IV, 


(18) 


* - (6 — P) 


4 


(n - 2)2(Xf - X) 2 
S(F, - F/) 2 


v = n — 2 
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will have Student's t distribution with n — 2 degrees of freedom. 
Formula (18) can be used to test compatibility between a sample and 
a theoretical regression coefficient or to find confidence limits for /?. 

As an illustration of (18), consider the data of Table 3 on the rela¬ 
tionship between the thickness of coatings of galvanized zinc by a 
standard stripping method, F, and a magnetic method, X . If the 


TABLE 3 


Y 

116 

132 

104 

139 

114 

129 

720 

174 

312 

338 

465 

X 

105 

120 

85 

121 

115 

127 

630 

155 

250 

310 

443 


magnetic method were reliable, it would be preferred because it is a 
non-destructive test. If a least-squares line is fitted to this set of 
points, its equation will be found to be 

F = —1.79 + 1.12X 


It will also be found that 

2(X t - X) 2 = 301,820 
2(F, - F/) 2 = 2,700 

Although the investigator here would undoubtedly be interested in 
obtaining a measure of the precision of the magnetic method as a 
substitute for the stripping method, that problem can be treated by 
the confidence-limits technique of the variance applied to the variance 
of the errors of estimate. The problem was introduced here to test 
whether the magnetic method was consistent over the range of thick¬ 
nesses. It might happen, for example, that the magnetic method 
gives too small a reading for thin coatings and too large a reading 
for thick coatings. If the method were biased in this direction, the 
slope of the regression line would tend to be too large. In this problem, 
therefore, set up the hypothesis that = 1. If it is assumed that the 
necessary conditions for applying (18) are satisfied, then 


t = 0.12 


4 


'9(301,826) 


3.76, ? = 9 


2,766 
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From Table IV the 5% critical value of t is 2.26; consequently this 
value is significant. It appears that there is a slight bias in the magnetic 
method. If the magnetic method were to be used, a larger experiment 
should be run in order to obtain an accurate estimate of the bias and 
correct for it. 

THE F DISTRIBUTION 

It will be recalled that it was necessary to assume that cr x = a y in 
order to apply the t distribution to testing the difference between two 
means. In order to justify the reasonableness of this assumption, it 
is necessary to derive a distribution function for testing the equality 
of two variances. It will be found that such a distribution function 
has many other uses as well. To this end, let u and v possess inde¬ 
pendent x 2 distributions with v\ and v 2 degrees of freedom, respectively. 
Then consider the problem of finding the distribution function of u/v . 
Because of the independence of u and v, it follows from ( 8 ) that 

/o, v) = f(u)f(v) 

v\ — 2 u i >2 — 2 v 

= cu 2 e *-cv 2 e 2 

where c denotes any and all constants of no immediate interest. Let 

u 

w — - 
v 


represent a change of variable from u to w with v held fixed. Since 
u * vw, it follows from (7) that 

f(w, V) = f(u, v)v 

»>i — 2 »>2 


v\ — 2 

= cw 2 v 2 e 




In order to obtain the distribution function of w, it is necessary to 
integrate out the variable v; hence 

/»ao 

/(w) = I /O, V) dv 

J o 



v 2 e 




dv 



THE F DISTRIBUTION 


Let y =* ~ (1 + w), then f(w) reduces to 
2 

l>l-f »2 —2 

vrz 2 2y \ 2 

m " m ’ J 0 (ttv ' 


2 2 

e~ v - dy 

1 + w 


cw 2 r 00 ^ 1 +^ 2—2 

-— I y 2 

^+»>2 J Q 

(1 + W) 2 


(1 + w) 2 

For certain applications, it is more convenient to work with a slight 
variation of w than with w itself; therefore consider the variable 

u/v 1 V 2 u V 2 
v/p 2 v l V Vl 

If this change of variable from w to F is made in (19), it follows that 


m - kw) 


V] ~\~V2 

iy 2 + v\F) 2 

These derivations prove the following theorem. 

Theorem V. If u and v possess independent x 2 distributions with 
v\ and v 2 degrees of freedom, respectively , then 


has the F distribution function with vi and v 2 degrees of freedom given by 


n -2 

cF 2 

n+vz 

O2 + nF) 2 


m = 
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APPLICATIONS OF THE F DISTRIBUTION 


1. Testing the Compatibility of Two Variances 

Since the F distribution was derived partly in order to justify the 
assumption of the equality of variances which is needed in the t test 
when that test is applied to testing the difference between two means, 
consider the problem of testing the hypothesis that <r x = v y under the 
assumption that x and y are normally distributed. Let s x 2 and s y 2 
be sample variances based upon random samples of size n x and n yy 
respectively, from these two populations. Then, since n x s x 2 /a x 2 and 
n v s y 2 /(jy 2 possess independent x 2 distributions, 

u ^ n x s 2 

n ( n x - l)a x 2 

and 

V 'ft'ySy 

V 2 ( n y ~ 1 )^ 2/ 2 

satisfy the requirements for u /and v/v 2 in Theorem V. By hypothesis 
<r x = a y ] therefore by Theorem V 


n x s 2 / (fix — 1 ) 
W'vSy / foy f) 


possesses the F distribution with n x — 1 and n y — 1 degrees of freedom. 
This test, like the t test, possesses the desirable feature of being inde¬ 
pendent of population parameters. 

As a numerical illustration, consider the problem that illustrated the 
application of the t distribution to the testing of the difference between 
two normal means. From Table 2 and immediately following, 

0 n x s 2 

V 2 =-= 0.071 


£ 2 _ 

(T,, = 


?ly 1 


= 0.027 


Therefore F = 2.63 with v x — v 2 — 9 degrees of freedom. It is neces¬ 
sary to consult tables of critical values of the F distribution in order 
to decide whether this value of F is unreasonably large or small. Such 
values are to be found in Table V. 
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Since the F distribution depends on the two parameters iq and v 2) 
a three-way table would be needed to tabulate the value of F corre¬ 
sponding to different probabilities and values of v x and v 2 . As a conse¬ 
quence, only the 5% and 1% right-tail-area points are tabulated corre¬ 
sponding to various values of v x and v 2 . The technique of the use of 
Table V will be explained by means of the graph in Fig. 6, which illus¬ 
trates the graph of f(F) for a typical pair of values of v x and v 2 . Let 
F x denote the value of F for which P[F < F x ] = 0.025, and F 2 the 



Fig. 6. A typical F distribution. 

value for which P[F > F 2 \ = 0.025. If the sample value of F falls 
outside the interval (F x , F 2 ), the hypothesis of a common <r 2 will be 
rejected. For convenience of notation, let F' = 1 / F. Since F 
= a*/oy with v x and v 2 degrees of freedom, F' = g 2 /g 2 with v 2 
and v x degrees of freedom. By means of the reciprocal function, 
F' f the probability of F < F x can be evaluated as follows: 

0.025 -P ( f<f lI -F[i>±].p[F'>i] 

This result shows that the left critical point of the F distribution corre¬ 
sponds to the right critical point of the F' distribution. As a result, 
it is necessary to find only right critical points for F and F f to determine 
F 2 and F x . Because of this property of F f only right critical points 
for F are tabulated. Unfortunately, only the 5% and 1% critical 
points have been tabulated in Table V; consequently it is necessary 
to interpolate roughly half way between these two values in order 
to obtain an approximate 2%% critical point. 

In view of this reciprocal property, the procedure to be followed is 
always to place the larger of the two unbiased variance estimates in 
the numerator of F; consequently a 2 will always denote the larger 
of the two estimates. If the hypothesis of a common a 2 is rejected 
whenever the sample value of F exceeds its 2J^% point, the hypothesis 
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will be rejected whenever the original F falls outside the interval 
(Fi, F 2 ), because, when F > 1, F 2 will serve as the critical value, and 
when F < 1, F f will be used instead and F 2 ' will serve as the critical 
value. But, as was demonstrated in the preceding paragraph, F 2 ' for 
F' corresponds to F\ for F. 

If this procedure is applied to the numerical problem being discussed, 
it will be found from Table V that the 5% critical value is 

F 2 — 4.5, vi = v 2 = 9 

The sample value of F = 2.63 is therefore not significant. This result 
implies that the assumption of equal variances is a reasonable one 
and that the significant value of t obtained in connection with this 
problem when testing the hypothesis m x = m y may not be reasonably 
attributed to a lack of the assumption <r x = a y being satisfied. This 
check on the reasonableness of the assumption that a x — <x y should be 
carried out whenever the t test is used to test the difference between 
two means. It does not follow, however, that, if the hypothesis 
<r x = cr y is not substantiated, a significant value of t will be due to a 
lack of this assumption’s being satisfied. 

2. Testing the Homogeneity of a Set of Means 

Problems similar to the one that arose in connection with Table 2 
are rather common, but in the early stages of experimentation there 
may be several means for comparison rather than just two. For 
example, phosphorus in several different amounts may have been 
applied to equal numbers of plots, or various compounds may have 
been tried as possible fertilizers to equal numbers of plots. For such 
experiments it is incorrect to use the difference-of-two-means technique 
on any two selected means because these differences will be a function 
of the number of means available for comparison. If there were a 
large number of such means, the smallest and largest differences would 
differ considerably by chance, even if all the means had been drawn 
from the same population. A set of more than two means can be 
legitimately compared by means of the F distribution in the following 
manner. 

Consider a set of n values of the statistical variable x arranged into 
a rows and b columns. For convenience of notation, let Xij represent 
the value found in the ith row and jth column, x.j the mean of the jth 
column, and fy. the mean of the ith row. This arrangement and nota¬ 
tion is indicated in Table 4. 
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TABLE 4 


Xu 

X\2 * 

•• XI j • 

• * 216 

*1- 

X21 

222 * 

• £ 2 / ‘ 

* 226 


Xtl 

Xi2 * 

* 2 *, 

* 2*6 

£i- 

Xal 

X a 2 * 

• * 2 a , 

• * 2 fl 6 

x a . 

x.i 

£.2 • 

2.; ■■ 

• £-6 

£ 

; 


Suppose that these values of x are random sample values from a 
normal population with mean m and variance <r 2 and that the arrange¬ 
ment of the values into rows and columns is a random one. Then the 
variance of x may be analyzed as follows. 

a b a b 

(2°) - x) 2 = lt2 ^ X(j ~ + 

1 J — 1 1—1 7 = 1 


a b a b 



a a 

+ 2 ^2 Ylfe; - - *) 


-1 .7-1 


If the cross-product term is summed with respect to i first, x.j — x 
will be a constant; hence this sum may be written as 

b a 

2 y^ ~ - *•>)] 

7^i T^i 

Since the sum with respect to i is the sum of the deviations of the 
elements of the jth column from their mean, this cross-product term 
vanishes; consequently (20) reduces to 

a b a b b 

(21) y 2 - ^) 2 =*22 yy^ ~ £ j 2 + *'£<*’ ~ 

»«»1 j**l i=l ; = 1 j = 1 

Now the Xij are merely random-sample values of the variable x; 
therefore by Theorem II it follows that 
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( 22 ) \ 2 ' 52 ( - Xl3 ~ £}2 

1 J — 1 

possesses a x 2 distribution with ab — 1 degrees of freedom. Since 
the elements of any given column are also random-sample values of 
the variable x , it follows from Theorem II and Theorem III that 

(23) p - X. } ) 2 

1=1 J — l 

possesses a x 2 distribution with b(a — 1 ) degrees of freedom. Theorem 
II applies to each column sum of squares, while Theorem III permits 
the combining of these quantities. Finally, because the x. 3 constitute 
a set of b random means, it follows from Theorem II that 

b 

(24) ~ y/z.y - *) 2 

* j-i 

possesses a x 2 distribution with b — 1 degrees of freedom. The factor 
a occurs here because the variable x. 3 is a mean with variance <r 2 /a 
and therefore the sum of squares must be divided by a 2 /a rather than 
by a 2 . 

It is clear from (21) that not all three of these quantities possessing 
X 2 distributions are independent. If the first term on the right were 
unusually large, for example, the left side would be made unusually 
large unless the second term on the right became small to compensate 
for the increase in the first term. However, it can be shown that the 
two terms on the right of ( 21 ) are independently distributed. The 
proof of this fact is fairly complicated and therefore will not be con¬ 
sidered here. If the fact is assumed here, it follows that of these three 
quantities only (23) and (24) are independently distributed. It there¬ 
fore follows from Theorem V that 


o a o 

a - £ ) 2 ^2 


<r 2 (i> - 1) 


a 2 b(a — 1) 


ab(a — 1 ) ^ ' (x.j — x) 2 

__ 

fl b 

(& - cyi - x . } ) 2 


1=1 J »» 1 


possesses an F distribution with v x = b — 1 and v 2 = b(a — 1) degrees 
of freedom. 
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In applying (25) to problems, the arrangement of the sample values 
into columns will be with respect to some criterion that the experi¬ 
menter is interested in testing. For example, as suggested at the 
beginning of this section, each column may correspond to a different 
amount of phosphorus added, or to a different compound. However, 
under the hypothesis that the variable x is independent of the criterion 
used for the classification, the columns may be treated as random 
samples and the F distribution may be applied to (25). If F exceeds 
its critical value, the hypothesis would be rejected and the experimenter 
would have evidence for believing that his criterion for classifying the 
data into columns was pertinent. This conclusion follows from the 
fact that, if the means of columns vary more than could be reasonably 
attributed to the random sampling variation of a normal variable, 
the value of the numerator in (25) would tend to be too large but the 
value of the denominator w r ould not be so affected, with the result that 
F would tend to be too large. In a problem of this type, only the right 
tail of the F distribution is used for determining a critical value because 
the experimenter is interested only in knowing whether the means of 
columns vary too much and hence always applies (25) directly rather 
than considering its reciprocal also as is done when testing the hypothe¬ 
sis g x — G y . In applying (25), one therefore chooses the 5% value of 
F in Table V when using a 5% level of significance. 

As a numerical illustration, consider the data of Table 5 on the yield 
of potatoes in pounds per plot in which 5 different treatments were 
used on 4 plots each. Although there appear to be treatment differences 
here, in order to test this belief it will be assumed that the 20 yields 
may be considered random-sample values of a normal variable. This 
assumption implies that the 5 treatments do not differ in their effects 
on yield and also that the 4 plots do not differ in fertility. 


TABLE 5 
Treatment 



A 

B 

C 

D 

E 

1 

306 

349 

442 

295 

457 

2 

288 

297 

434 

268 

415 

3 

307 

304 

419 

310 

467 

4 

268 

308 

404 

166 

428 
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For computing purposes, the sums in (24) and (23) may be written 


a b a b - b , a v o 

t»i 7-1 t-i j=»i a x »»!, 7 


and 


These computing forms are readily verified by expanding the left sides 
and expressing all means in terms of sums. On a calculating machine 
the sums and sums of squares of elements for each column are calcu¬ 
lated, then the sum of squares of these column totals is obtained. 
Calculations for this problem give 

5 

4 2^(x.,- - x) 2 = 2,509,384 - 2,402,631 = 106,753 

and 

4 5 

y - x. } y = 2,527,292 - 2,509,384 = 17,908 

t=»i 7=1 

Then when (25) is applied, 


15(106,753) 
T(17^908T 


22, v\ =4, i >2 — 15 


From Table V it is clear that this result is highly significant; therefore 
the 5 treatments undoubtedly differ in their effect on yield. A statistical 
test would hardly have been necessary here because an inspection of 
Table 5 will reveal that treatments C and E are superior to the others. 
However, if the differences had not been quite so pronounced, the 
experimenter would not have been able to make a valid judgment 
without some statistical test. 


3, Testing the Homogeneity of Rows and Columns 

In the last section, interest was centered on the variation of column 
means because the amount of this variation determined whether the 
classification with respect to columns was significant. However, the 
technique of breaking down the fundamental sum of squares into 
components of experimental interest, as was done in (21), can be 
generalized. It is known as the analysis-of-variance technique. In 
this method the fundamental sum of squares is analyzed into compo¬ 
nents such that one component measures that part of the variation 
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that is being tested and another component measures what is often 
called the experimental error , that is, the variation in the fundamental 
variable after the effects of the controlled variables have been elimi¬ 
nated. For the classification considered in the last section, the funda¬ 
mental sum of squares on the left side of ( 21 ) is broken down into two 
sums of squares. The second term on the right measures the variation 
between columns and hence measures that part of the variation being 
tested. The first term measures the variation within columns and 
is not influenced by the variation between columns; consequently this 
term measures what may be thought of as the natural variation after 
column effects have been eliminated, or as experimental error. 

Consider the analysis-of-variance technique when both row and 
column variation are of interest. The fundamental sum of squares is 
broken down as follows. 


i-i j-i 


a b 


[{x.j - X) + (x t . - x) 


1 *= 1 J = 1 


+ (x t , — X.J — x L , + x )] 2 


a b 


= “ * )2 + 2 ~ £) ' 2 

t=l j=l t-1 

a b 

+y: - x-j - x t .+xf 


.-i j=i 


This result follows from the fact that all cross-product terms vanish on 
summation. Of the three terms on the right, the first measures the 
variation between columns, the second measures the variation between 
rows, and the last measures the variation after row and column effects 
have been eliminated. This fundamental identity simplifies somewhat 
into 

a b b a 

(26) ^2 - x ) 2 = - x ) 2 + bYl(x„ - x ) 2 

l«l J»1 ;=1 1=1 

a b 

+ Y ! - x.j - Xi. +x ) 2 


If, as in the preceding section, it is assumed that x is normally dis¬ 
tributed with variance a 2 and the classification into rows and columns 
is a random classification, then it can be shown that all three quantities 
on the right of (26) when divided by <r 2 possess independent x 2 dis¬ 
tributions with b — 1 , a — 1 , and (a — 1)(6 — 1 ) degrees of freedom, 
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respectively. If the experimenter were interested in the means of 
columns, he would apply Theorem V to 

b 


u 


~ £) 2 
J 1 _ 


~ 1 ) 


and 


a b 

/. - x.j - X,. + x) 2 

V_ _ 7Ti 7^1 _ 

v 2 a 2 {a — 1)(6 — 1 ) 


If he were interested in the means of rows, he would apply the F test 
to 


and 


u 

n 


- x ) 2 

T=T 


a 2 (a — 1) 


v 

V‘2 


a o 

i — 1 j = 1 


X.y — + x) 2 


a 2 (a — 1)(6 — 1) 


Although the proofs that the F distribution may be applied as indicated 
are fairly involved, the mechanics of applying the F test is straight¬ 
forward. 

If the columns in the preceding test differ significantly, the F test 
may still be applied to testing the homogeneity of the rows, provided 
that a further assumption concerning the linearity of means is made. 
The meaning of this assumption will not be discussed here; however, 
it is a very plausible assumption. 

As a numerical illustration of these ideas, consider once more the 
problem of Table 5. From (26) it is clear that the last sum is most 
easily calculated by subtraction after the remaining sums have been 
calculated. Calculations here give 


£ I > - «• - £ £,/ (£ £>..) - 

d E(£4- 2>-)‘ - “•■«» 

>i> - ; £(i>) 2 -5 (£ i»*- ^ 
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Hence 

y: —f .j - x,. +«) a =8,873 

The test for homogeneity of columns yields 


12(106,753) 
”4^8,873) " 


= 36, 


n = 4, 


^2 — 12 


If this result is compared with that obtained by the analysis following 
Table 5, it will be observed that the present analysis is sharper than 
that of the previous section, implying that the elimination of row 
variability reduced the error variance in the denominator. The test 
for homogeneity of rows gives 


12(9,035) 

3(8,873)" 


v\ = 3, 


^2 


12 


From Table V it will be found that the critical value of F at the 5% 
level of significance is 3.49; consequently the value 4.1 is barely signifi¬ 
cant. This result indicates that an explanation other than sampling 
variation should be sought to account for the row variability. Experi¬ 
ence with experiments of this type shows that there is considerable 
variation of soil fertility in experimental plots unless the plots are small 
and close together. The lack of homogeneity in plot yields may there¬ 
fore be caused by differences of fertility in the 4 plots. 


DISTRIBUTION OF THE RANGE 

In certain fields of applied statistics, the amount of routine computa¬ 
tion becomes burdensome unless methods are chosen that involve only 
a small amount of it. In industrial quality control work, for example, 
the repeated computation of standard deviations as measures of the 
variability of a product is undesirable. It is customary in such work 
to take the range as the measure of variability. Not only is the range 
easy to compute, but also it is simple to explain as a measure of varia¬ 
bility to individuals without a statistical background. For small 
samples from a normal population, it can be shown that the range is 
nearly as efficient, in a certain sense, for estimating a as is the sample 
standard deviation; consequently for small samples the range is a 
highly useful statistic. 

Consider a random sample, X\ 7 X 2 , • • - , x n ', drawn from the popula¬ 
tion whose distribution function will be denoted by p(x). Let these 
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sample values be arranged in order of increasing magnitude, and denote 
the ordered set by x\, x 2 , • • •, x n . Now consider the problem of finding 
the probability that the smallest value, zi, and the largest value, x ny 
will fall within specified intervals. The distribution function of the 
range can be found quite easily by means of this probability. 

Let the x axis be divided into the five intervals (—<*>, u), (u,u + A u), 
(u + A u, v), (v, v + Ay), (y + Ay, oo), where u < v are any two values 
of x. The probability that x will fall in any particular one of these 
intervals is given by the integral of p(x) over that interval; hence the 
probabilities corresponding to these five intervals can be written down 
even though they cannot be evaluated unless the form of p(x) is known. 
In this connection, let 

r -f-Att pv /*v+ Av 

p(x) dx, p 3 = I p(x) dx, Pi = | p(x) dx 

%/u-\~Au Jv 


and determine the probability that in a sample of n values of x one 
will obtain no value in the first interval, one value in the second interval, 
n — 2 values in the third interval, one value in the fourth interval, 
and no value in the fifth interval. This procedure is equivalent to 
finding the probability that the smallest value in the sample will fall 
between u and u + Au while the largest value falls between v and 
v + Ay. The desired probability can be obtained directly from the 
multinomial distribution given by (39), Chapter III, by treating x 
as a discrete variable which can assume only one of five possible values 
corresponding to the five intervals. If p\ and denote the probabilities 
that x will fall in the first and fifth intervals, respectively, the desired 
probability is given by 


which reduces to 


n! 


0!l!(n - 2) !1!0! 


Pi°P 2 1 Pa n 


"Wps 0 


(28) nin - l)p 2 pip 3 n 2 

Expression (28) can be simplified somewhat by simplifying the 
integrals of (27). Since p(x) is assumed to be a continuous function, 
the mean-value theorem for integrals may be applied here. This 
theorem states that, if p(x) is continuous on the interval (a, fi), then 


J 0 p(z) dx = 03 - a)p(X) 



DISTRIBUTION OF THE RANGE 


163 


where X is some number in r the interval (a, 0). A direct 
of this theorem to (27) shows that 

p 2 = A up(u + #i Aw), 0 ^ 0i ^ 1 


and 


application 


p 4 = A vp(v + 62 Av), 0 ^ 0 2 ^ 1 

The first of these two results when applied to p 3 yields 


Vs = 



dx 





dx — A up(u + 61 Aw) 


If these values for p 2 , p 3 , and p 4 are inserted in (28), it becomes 


(29) 


n(n — l)p(w + 0i Au)p(v + 0 2 Av) 



dx — Aup(u + 6\ Aw) 




n—2 

Aw At; 


This expression gives the probability that the smallest and largest 
values of a sample of size n will yield a point in the u, v plane that lies 
within the rectangle of dimensions Aw and At; which has the vertex 
nearest the origin at the point (w, v). Since u and v are arbitrary, they 
may be treated as statistical variables. In order to find the proba¬ 
bility density function of these two variables, it is necessary to divide 
this probability by the area of the rectangle, namely Aw Av, and take 
the limit of the resulting quotient as Aw and Av approach zero. If this 
desired distribution function is denoted by/(w, v), it follows from (29) 


that 

(30) 


/(w, v) = n(n — l)p(u)p(v) 


J p{x) dx 

u 


The preceding developments prove the following theorem. 


Theorem VI. If u and v denote the smallest and largest values , respec¬ 
tively , in a random sample of size n from the population with the con¬ 
tinuous distribution function p(x ), then the joint distribution function 
of u and v is given by 


/(w, v) = n(n — l)p(u)p(y) f p{x) dx\ 

LJ U 
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The distribution function of the range can be obtained very quickly 
from this result. Let R = v — u represent a change of variable from 
v to R with u held fixed. Then, by (7) and (30), 


/(«, R) 

(31) 

Consequently, 


= f(u, v) 

= n(n — l)p(u)p(u + R) 


r /*u+r 

U. ' 


p(x) dx 


n —2 



-R 

f(u, R) du 


where a, b is the range for which p(x) is defined. The upper limit of 
b — R arises from the fact that u — v — R; therefore, for R fixed, u 
cannot exceed the value obtained by giving v its maximum value b. 
If the explicit expression for/(tq R) as given by (31) is inserted in this 
integral, the following formula for the distribution function of the 
range results. 

X b — R T /*u + R ~jn —2 

p{u)p{u + R)\ J p(x) dx J du 


Unless the integral of p(x) is quite simple, this expression is likely 
to be difficult to work with, even numerically. As an illustration of a 
simple problem, consider the range for a sample of size n from the 
horizontal distribution which is defined for 0 ^ x ^ b by p{x) = 1/6. 
Here 


r*u 

doi 


+* R 

p(x) dx = — 


Therefore by (32) 

i i r rt ~ 2 

/(*)=»(»- l)J o du 

= n(n - l)b~ n R n ~ 2 (b - R) 


APPLICATIONS OF THE RANGE 

In the introduction to the last section it was remarked that the 
range was useful as a substitute for the standard deviation as a measure 
of variability in certain routine operations. It should therefore be of 
interest to know what the relationship is between the range and the 
standard deviation. This relationship may be found from the mean of 
R . Since 

pb—a 

rriR = I Rf(R) dR 

Jo 
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it is clear from (32) that the evaluation of the relationship will give 
rise to a complicated double integral unless p(x) is a convenient func¬ 
tion. Unfortunately, if x is normally distributed, these integrations 
cannot be performed directly; therefore numerical methods of integra¬ 
tion are required. Tables are available for the normal variable case 
which express vir in terms of cr, corresponding to different values of 
n. Table 0 gives a few entries from such a table to indicate the nature 
of the relationship. 

TABLE 6 


n 

2 

3 

4 

5 

10 

50 , 

100 

1,000 

mn 

or 

1.128 

1 693 

2 059 

2.326 

3.078 

4.498 

5.015 

6.483 


As an illustration of the use of such tables, consider once more the 
technique of constructing a quality control chart for x as given in the 
section on the distribution of x for non-normal populations in Chapter 
IV. There a 3band was constructed for controlling x. If the range 
is taken as the measure of variability, 3a s = 3 v/y/n will be replaced 
by 3mj i /d n \/ r n, where d n is the value obtained from the table from 
which Table 6 was extracted. The value of rriR can be approximated 
from the mean of sample values of R based on samples of size n each. 
For such charts, n is usually chosen to be an integer near 4; consequently 
a fairly large number of such samples is needed before a precise estimate 
of either m or m.R will be available. 

If n is chosen less than 10, the estimation of a by means of the range 
rather than the standard deviation of a sample is quite efficient. 
Investigations have shown that it requires approximately 115 sample 
ranges based on G observations each to yield the same precision as 
100 sample standard deviations based on 6 observations each, provided 
the variable is normally distributed. 
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i- 1 _ 

Xfo - 

<-i 
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The application of the range to quality control charts, as well as a discussion of 
various quality control techniques, may be found in: 



EXERCISES 107 

Control Chart Method of Controlling Quality During Production , American War 
Standards, Z1.3—1942, New York: American Standards Association. 

EXERCISES 

1. For a certain observed distribution, n *=» 20, £ 42, and s » 5. Assuming 

that x is normally distributed: (a) test the hypothesis that <r ■* 8; (6) find 98% 
confidence hmits for <r 2 ; (c) test the hypothesis that m — 50; (d) find 99% confidence 
limits for m. 

2 . For the data of problem 10, Chapter IV, determine 96% confidence limits 
for cr 2 . 

3. If a quality control chart were to be constructed for controlling $ 2 based on 
samples of 5 each for the data of problem 10, Chapter IV, what control limits for 
8 2 should be set if s 2 for all the data is assumed to be equal to cr 2 ? 

4 . The following data give the amounts of corrosion of pipe coatings for under¬ 
ground use in a series of field tests in different types of soil. Taking differences of 
similar pairs to eliminate differences due to soil type, test the hypothesis that the 
two kinds of pipe do not differ in their resistance to corrosion, that is, that m ■» 0 
for such differences. 


Soil type 

A 

B 

C 

D 

E 

F 

Q 

H 

I 

J 

K 

L 

M 

N 

Lead-coated steel pipe 

27.3 

18.4 

11.9 

28.7 

11.3 

14.8 

20.8 

21.6 

17.9 

7.8 

18.6 

14.7 

19.0 

66.3 

Bare steel pipe 

41.3 

18 9 

21.7 

9 8 

16 8 

9 0 

19.3 

11.1 

32.1 

7.4 

68.3 

20.7 

34.4 

76.2 


6. The following data give the gains of 10 pairs of rats, half of which received 
their protein from raw peanuts while the other half received their protein from 
roasted peanuts. Test to see whether roasting the peanuts had any effect on their 
protein value. 


Raw 

61 

60 

56 

63 

56 

63 

59 

56 

44 

61 

Roasted 

55 

54 

47 

59 

51 

61 

57 

54 

62 

58 


6. In an industrial experiment a job was performed by 30 workmen according 
to method I and by 40 workmen according to method II. The two groups were 
equally skilled. The following data give the results of the experiment. Deter¬ 
mine by means of 95% confidence limits how much time on the average the plant 
could be expected to save by using method I. 


Time 

50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 

I 

1 

3 

5 

4 

7 

5 

3 

1 

1 

0 

0 

II 

0 

1 

2 

5 

8 

9 

6 

3 

3 

i 

2 
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7 . The following data give the intelligence quotients of 25 male juvenile de¬ 
linquents and 25 male non-delinquents matched by age, family income, and place 
of residence, (a) Taking differences of matched pairs, test the hypothesis that 
the mean difference is zero. (6) Treating the two sets as independent, test the 
hypothesis that the difference of the two means is zero, (c) Note any difference 
in these two results, and comment on the advantages and disadvantages of this 
matching technique. 


Delinquent 

103 

: 

80 

114 

100 

91 

73 

105 

98 

86 

Non-delinquent 

99 

92 | 

106 

104 

88 

i 

80 

109 

i 

94 

90 


Delinquent 

101 

92 

86 

93 

90 

79 

108 ! 

82 

95 

Non-delinquent 

97 

89 

91 

90 

97 

84 

96 

91 

86 


Delinquent 

74 

102 

105 

97 

88 

94 

99 

Non-delinquent 

83 

97 

99 

103 

i 

91 

84 

106 


8 . For the data of problem 2, Chapter V, determine confidence limits for /3 in 
the regression equation for estimating tensile strength from hardness. 

9 . The following table gives the condensation of data on the hardness and 
bending strength of wood stored outside and inside. Test to see whether either 
the mean or variability of hardness or of bending strength is affected by weathering. 



Hardness 

Bending Strength 


Outside 

Inside 

Outside 

Inside 

Number 

40 

100 

40 

100 

Mean 

117 

132 

6,184 

6,270 

Sum of squares 
about mean 

8,655 

27,244 

16,799,390 

30,459,499 


10 . For the data of problem 10, Chapter IV, calculate s 2 for the first 30 and the 
last 30 entries, {a) Test these two values for homogeneity. ( b ) Would it be legiti¬ 
mate to select the samples from the tenth and twelfth hours and test them against 
the remaining samples for homogeneity? 
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11 . The following table gives the gains of 4 different types of hogs fed 3 different 
rations. Test to see whether the rations or the hog types differ in their effect on 
weight. 

Type 



I 

II 

III 

IV 

A 

7.0 

16.0 

10.5 

13.5 

B 

14.0 

15.5 

15.0 

21.0 

C 

8.5 

16.5 

9.5 

13.5 


12 . The following data give the impact-strength readings in foot-pounds on 6 
lots of insulating material. One specimen from each of 20 different sheets was 
tested from each of the 5 lots. The first 10 specimens were cut along the length¬ 
wise direction of the sheets; the remaining 10 were cut along the crosswise direc¬ 
tion. Test for differences between (a) lots (b) lengthwise and crosswise specimens. 


Lengthwise Crosswise 


I 

II 

III 

IV 

V 

I 

II 

III 

IV 

V 

1.15 

1 16 

0 79 

0.96 

0.49 

! 

0.89 

0.86 

0.52 

0 86 

0.52 

0.84 

0.85 

.68 

.82 

.61 

.69 

1.17 

.52 

1.06 

.53 

.88 

1.00 

.64 

.98 

.59 

.46 

1.18 

.80 

.81 

.47 

.91 

1 .08 

.72 

.93 

.51 

.85 

1.32 

.64 

.97 

.47 

.86 

0.80 

.63 

.81 

.53 

.73 

1.03 

.63 

.90 

.57 

.88 

1.01 

.59 

.79 

.72 

.67 

0.84 

.58 

.93 

.54 

.92 

1.14 

.81 

.79 

.67 

.78 

0.89 

.65 

.87 

.56 

.87 

0.87 

.65 

.86 

.47 

.77 

0 84 

.60 

.88 

.55 

.93 

0.97 

.64 

.84 

.44 

.80 

1 .03 

.71 

.89 

.45 

.95 

1.09 

.75 

.92 

.48 

.79 

1.06 

.59 

.82 

.60 


13 . For problem 10, Chapter IV, express control limits in terms of the range. 

14 . Find the distribution function of R if x has the distribution function e~ x , 
x ^ 0. 

15 . Find the probability that in a sample of 10 random numbers drawm from the 
numbers between 0 and 1 the range will exceed 0.8. 

16 . Find how many random numbers between 0 and 1 would need to be drawn 
in order that the probability will exceed 0.95 that the range will exceed 0.90. 

17 . Suppose that samples of size 4 are taken from the population given by e~~ x t 

x ^ 0. (a) Determine the mean value of R. ( b ) Determine <r, and then compare 

the ratio rtiR [a with that given by Table 6 for a normal variable, (c) Determine 
limits, R\ and J? 2 > for R such that P[R < JKi] = 0.025 and P[R > R 2 ] = 0.025. 
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18 . Derive a formula for the variance of z = c\x\ H— • + CkXk in terms of the 
variances and correlations of the x’s (a) if the x t - are independent, (6) if the x» are 
correlated. 

19 . If/(x) = e~ x f x ^ 0, find f(y) where y = 1/x. 

20 . If f(x) = e~ x , x ^ 0: (a) determine f(x); ( b ) determine f(z) y where z « u/v 

and u and v are successive pairs of independent sample values of x; (c) determine 

/(X 2 ). 

21. Determine a formula for confidence limits for a in the regression equation 
y » a + fix by using the methods for finding confidence limits for 

22 . If x and y possess independent standard normal distributions, derive the 

distribution function of z where z — \^x 2 -f y 2 . The variable z represents the 
radial error in gunnery and bombing problems in which x and y represent inde¬ 
pendent coordinate axes errors of equal variability. 

23 . Using the expression for f(ns 2 /<r 2 ) y derive the formula <r S 2 = 2 (n — 1 )/n 2 

for the standard deviation of a sample variance, if the sample variance is based 
on a random sample of size n from a normal population. 

k 

24 . Using the results of problem 18, determine the variance of z —^^a t Si 2 / 

k 1 

in which the a» are constants and the s; 2 are independent sample variances 

TL 

based on samples of size n t . 

25 . Using the results of problems 23 and 24, determine the a t - so as to minimize 
the variance of z under the assumption that the k sample variances were from k 
independent normal populations having the same variance. Compare your re¬ 
sult with (5). 
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NON-PARAMETRIC METHODS 

Most significance tests of statistical theory require certain assump¬ 
tions concerning the form of some distribution function. In the small- 
sample methods of the preceding chapter, it was usually assumed that 
the basic variable was normally distributed. It will be recalled that 
in the chapter on large-sample methods significance tests involving 
means were considered accurate without the customary normality 
assumption. However, some statistics possess distributions that de¬ 
pend heavily upon the form of the basic variable distribution even for 
large samples; consequently the use of such statistics is restricted to 
situations in which the necessary assumptions are satisfied fairly well. 
For situations in which very little is known about the distribution of 
the basic variable or for which it is known that the necessary assump¬ 
tions are not satisfied, it is necessary to develop methods that do not 
require those assumptions. For complete generality of application, it 
would be highly desirable to have methods that do not require any 
knowledge concerning the form of the basic distribution function. 

A second reason for considering such methods is that certain signifi¬ 
cance tests that have proved useful in industrial statistics are of this 
type. Such tests are usually based on the qualitative order relation¬ 
ships of data as distinguished from the quantitative relationships em¬ 
ployed in most tests. 

A significance test that requires no assumption about the form of the 
basic distribution function could hardly be expected to be as efficient 
as one that needs some such assumption. To compensate somewhat 
for this decrease in efficiency, there is the advantage of complete 
generality in applications without the necessity of checking to see 
whether certain assumptions are satisfied. 

Methods of the type just described are usually called non-parametric 
methods because they do not involve the estimation of parameter values 
of a distribution function. Although non-parametric techniques are 
numerous, only a few of the more common ones will be considered here. 
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TCHEBYCHEFF’S INEQUALITY 


Consider the problem of determining what the probability is that a 
variable x will assume a value more than k standard deviations away 
from its mean. Assume that/(x) is a continuous distribution function 
which possesses a finite variance. Then the integral defining its 
variance may be treated as follows. 

(x — m) 2 f(x) dx 



r —ka nm+ka 

(,x — m) 2 f(x) dx + I (x — m) 2 f(x) dx 

dm — ha 


+ 



dx 


Since the middle integral is a non-negative quantity for k > 0, 

n — ka /•& 


r — ka 

(x — m) 2 f(x) dx + I (x — m) 2 f(x) 

J m-fkcr 


dx 


In each of these integrals the quantity (x — m) 2 will assume its mini¬ 
mum value for that value of x in the range of integration which is 
nearest m. For the first integral this value is the upper limit, and for 
the second integral it is the lower limit; consequently 


r — ka 

(k<r) 2 f(x) dx + I (k<j) 2 J(x) dx 

w dm-\-ka 

U r*m — ka *1 

f{x) dx + | fix) dx 
a Jm + ka J 


The first integral gives the probability that x will be to the left of 
m — ka, and the second that x will be to the right of m + ka; hence 
this inequality is equivalent to the inequality 

<r 2 > /cVP[ | * - TO | > fax] 

If <r 2 is divided out, this expression yields 

, , 1 
(1) Tchebycheff’s Inequality: P[\ x — m | > ka] < ~ 

k 


Since no assumption was made regarding the form of f(x), any test 
based upon Tchebycheff’s inequality would be a non-parametric test. 

By replacing integrals with sums, it is easily demonstrated that this 
inequality holds for a discrete variable as well. 
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For the purpose of observing the decrease in sensitivity in using 
(1) when the form of f(x) is known, consider the case in which f(x) 
is a normal distribution function. Computations by means of (1) and 
Table II yield the probabilities in Table 1. From this table it is clear 


TABLE 1 

h TchebychcfJ Normal 

1 P <. 1 P = 0.32 

2 P ^ 0 25 P = 0.05 

3 P <> 0.11 P - 0.003 


that the normality assumption, if it can be legitimately made, greatly 
increases the sensitivity of such probability statements. By making 
further assumptions about f(x) which do not involve a knowledge of 
its form, it is possible to obtain more refined inequalities of this type. 
Tchebycheff’s inequality is not of much practical value except for 
very large samples. 

As an illustration, consider the problem of determining how large 
a sample would be needed to reject at the 5% level of significance a 
deviation of 3 units in a sample mean if m = 50 and <r = 5. Here the 
variable is x; hence <r 2 = 5/y/n. In order that P shall not exceed 
0.05, it is necessary that k be chosen to satisfy 1/k 2 = 0.05; therefore 
k = \/20. With this choice of k, Tchebycheff’s inequality becomes 

/' [| x - 50 I > y/20 < 0.05 

Since | x — 50 [ — 3 in this problem, the inequality will be satisfied 
if n exceeds the root of the equation 


3 = \/20 


5 


v4 


which is n — 55.6. Thus, a sample of size 56 will suffice here. 


LAW OF LARGE NUMBERS 

An interesting application of Tchebycheff’s inequality arises when 
the variable x is the success ratio, p', in n trials of an event for which 
the probability of success in a single trial is p. Here m = p and 
& = Vpg/n; therefore (1) becomes 
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Now choose a number e > 0, and choose k = e/y/pq/n. Then the pre¬ 
ceding inequality reduces to 

, . pq 1 

P[\p' -P\> *1^ —- 
C 71 

No matter how small € may have been chosen, by allowing the number 
of trials, n, to increase sufficiently, P can be made as small as desired. 
This conclusion, which is called the law of large numbers , shows the 
probability nature of the convergence of a sample success ratio to its 
expected value. This probability type of convergence is often called 
stochastic convergence to distinguish it from ordinary mathematical 
convergence. The law of large numbers does not guarantee, for 
example, that the success ratio for a true coin will approach as a 
limit, but rather that the probability of the success ratio differing 
from y 2 by more than any given amount will approach zero as a limit. 
It is customary here to say that the success ratio converges stochasti¬ 
cally to y. 

TOLERANCE LIMITS 

The problem of determining the variability of a normal variable was 
solved in the preceding chapter by finding confidence limits for a 2 . 
If the form of the distribution function is unknown, a different ap¬ 
proach to the problem is necessary. One approach consists in finding 
two numbers, L\ and L 2 , which are based upon the results of sampling 
and between which a high percentage of the population may be ex¬ 
pected to he. Such numbers are called tolerance limits . The name arose 
in connection with the industrial problem of controlling a quality 
characteristic of a product by finding limits within which with respect 
to this characteristic a high percentage of the product could be expected 
to lie. 

If the numbers Li and L 2 are chosen as the smallest and largest 
values, respectively, to be found in the sample, it will be found that 
the percentage of the population which can be expected to lie between 
Li and L 2 does not depend upon the form of the distribution function 
of the variable being considered. This is also true of certain other 
common choices for L\ and L 2) but they will not be considered here. 

Let the variable x possess the continuous distribution function p{x). 
Then by Theorem VI, Chapter VIII, if u and v denote the smallest 
and largest values respectively in a random sample of size n , the joint 
distribution of u and v will be given by 

(2) f(v, v) - n(n — l)p(u)p(i>) 1" f p(x) dx 1 
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Now the integral within the brackets is precisely the desired proportion 
of the population lying between the extreme values of the sample; 
therefore consider the change of variable from v to z with u held fixed, 
where 


(3) 



dx 


By means of the calculus theorem stated after (5), Chapter VI, it 
follows that dz/dv = p{v) and therefore by (7), Chapter VIII, that 


/(w, z) = f(u, v) 


Hence, from (2), 

(4) 


p(v) 


f(u , z) = n(n — 1 )p(u)z n ~ 


Now hold z fixed and consider the change of variable from u to w 9 
where 


( 5 ) w = r p(x) 

j a 

Here dw/du = p(u); consequently 

f(w, z ) = f(u, z) 

Hence, from (4), 


dx 


p(u) 


( 6 ) 


f(w y z) = n(n — 1 )z‘ 


,n—2 


In order to obtain the distribution function of z , it is merely necessary 
to integrate this function with respect to w over its range of values 
z fixed. From (3) and (5) it will be observed that w + z equals tne 
probability that x will not exceed v; therefore w + z <1. Since z is 
being held fixed, w can assume values from 0 to 1 — z only; conse¬ 
quently, from (6), 

*1 —x 


m = / 

-L 


f(w, z) dw 


n(n — 1 )z n 2 dw 
'o 

- n(n - l)z n ~ 2 {l - z) 

This demonstrates the following theorem. 

Theorem I. If a variable possesses a continuous distribution function 
and if z denotes the proportion of the population that lies between the 
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extreme values of a random sample of size n drawn from this population, 
then the distribution function of z is given by 

m =»(»- i)z”~ 2 (i - z) 

As an illustration of the application of this theorem, consider the 
problem of determining how large a sample must be taken in order to 
be certain with a probability of 0.95 that at least 99% of the population 
will lie between the extreme values of the sample. The solution is given 
by determining the value of n which satisfies the equation 



f(z) dz = 0.95 


If the value of f(z) given by Theorem I is inserted and the integration 
is performed, this equation becomes 


n(n — 1) 



1 

n 


(0.99) n—1 

71—1 



which simplifies to 

4.95 

(0.99) 71 =- 

n +99 


0.95 


It will be found by trial that the integer that most nearly satisfies this 
equation is n — 473; consequently a sample of this size is required to 
obtain the desired coverage and certainty from the extreme values of 
the sample as tolerance limits. It is clear from this example that a very 
large sample is necessary before the extreme values of a sample will 
suffice to set limits within which practically all the population would 
be expected to lie. 

The transcendental equation which arises in determining the value 
of n for problems of this type is not easy to solve; consequently a simple 
approximate solution would be highly desirable. Such an approxima¬ 
tion exists in the formula 



1 + 8 
1 - 6 



where 5 is the proportion of the population to be covered by the sample 
range, a is 1 minus the desired probability, and x<* 2 is the value of x 2 
for 4 degrees of freedom for which P[x 2 > x« 2 ] = If this formula 
is applied to the problem that was just solved, to the nearest integer 

1 1.99 1 

n = - (9.488)-b “ = 473 

4 0.01 2 
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RANDOMNESS OF SEQUENCES 

Most of the statistical methods that have been considered thus far 
were designed to be applied to data for which no useful information 
was gained by preserving the order of the observations. Some of these 
methods nevertheless proved to be valuable for situations in which 
order was important. For example, quality control charts for per¬ 
centages and for means employed such methods. 

Situations arise, however, in quality control work in which the 
control-chart technique fails to capitalize on the information available 
in the time-order relationships of the data. For example, the occurrence 
of numerous slight erratic shifts in the mean might continue unnoticed 
in the customary control chart. Methods are available which concern 
themselves with such behavior patterns and which therefore serve as 
additional tools in discovering a lack of randomness in sequences of 
observations when it exists. These methods assume that the observa¬ 
tions constitute a random sequence and then determine a function 
of the observations for testing the randomness of the sequence. Some 
of these methods also have the desirable property of being non- 
parametric. Two such methods will be discussed in this chapter. 

1. Runs 

Consider the following set of values of a variable: 

20, 23, 18, 17, 24, 16, 17, 21, 22, 26, 15, 16 

If each value is assigned the letter a provided that it is less than the 
median 19 and the letter b provided that it is greater than 19, this set 
of values will give rise to the following set of letters: 

b, b } a , a, 6, a, a, b , b , b , a, a 

A sequence of i identical letters which is preceded and followed by a 
different letter or no letter is called a run of length i. The runs of a’s 
and ?/s in this example have the lengths 2, 2, 1, 2, 3, 2. 

If a set of values of a variable has been obtained by taking samples 
at regular time intervals, and if there is, say, a trend or a cyclical pat¬ 
tern in the sequence of values, the fact can be discovered by studying 
the runs of a ’\s and b 's that result from the sequence when a value less 
than the median is assigned the letter a and a value greater than the 
median is assigned the letter b. In the event of an upward trend, there 
would be a tendency for long runs of a’s to be followed by long runs of 
V s. In a regular cyclical pattern, the runs would tend to be of uniform 
length. It appears from illustrations such as these that the study of 
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runs would prove particularly useful in testing for randomness of data 
that have been ordered with respect to time. 

For the purpose of deriving a test of randomness, consider a sample 
of size n containing n a a’s and rib b’s where n = n a + rib. Let r a and r& 
denote the total number of runs of a’s and b’s, respectively. Now con¬ 
sider the basic problem of finding the probability of obtaining specified 
values of r a and under the assumption that all possible permutations 
of the a’s and b’s have the same probability of occurring. The desired 
probability will be given by the ratio of the number of permutations 
possible when n, n a , rib, T a , and are held fixed to the number of permu¬ 
tations possible when n , n a , and rib are held fixed. This assumes that 
only samples of size n which give rise to n a a’s and rib b’s are being con¬ 
sidered and that the two statistical variables here are r a and r&. 

The denominator of this probability ratio, which is the number of 
permutations when n, n a , and rib are held fixed, is equal to the number 
of ways of permuting n things of which n a are alike and rib are alike. 
By (22), Chapter III, the denominator is 

n\ 

(7) - 

n a ln b l 

For the purpose of counting the number of permutations when r a 
and r b are also held fixed, concentrate first upon the a’s. The permuta¬ 
tions of the a’s will be obtained in two steps. First the number of 
permutations for a fixed set of run lengths which total r a will be deter¬ 
mined. Then these permutations will be summed for all possible sets 
of run lengths which total r a . In this connection it is convenient to 
study the multinomial 

(8) (pl + P2 H-1- PnJ° 

Consider the general term in the expansion of this multinomial, 
namely, 

Cp i ni P2 n2 "-pZ a 

where the explicit value of C is given by (39), Chapter III, and where 


n 0 



*«1 


Each run of a’s of length i may be thought of as being replaced by the 
letter x t ; then the first step is to count the number of permutations of 
the x’s. If p x is associated with x ly and n % denotes the number of runs 
of length ij then from (39), Chapter III, it follows that C gives precisely 
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the number of such permutations. This concludes the first step in the 
counting procedure. The second step consists in summing coefficients 
like C for all sets of run lengths which satisfy (10). Because n a is also 
being held fixed here, it follows that the n* must satisfy 


( 10 ) 


n a 



t»l 


in addition to (9). To accomplish the summing of coefficients like <7, 
replace pi by p l in (8). Then (8) will assume the form 

(11) (p + p 2 + p 3 +■■■+ p na Y a 


and the general term will assume the form 


( 12 ) 


2n2 + 3na-f- * * '+w 0 Wn 


o 


From (12) it will be observed that the term in the expansion of (11) 
which contains p na satisfies condition (10). Since this is the only term 
for which (10) is satisfied, the coefficient of p Ua gives the sum of coeffi¬ 
cients like C satisfying both (9) and (10). But this is precisely the sum 
of the permutations of the a’s for a fixed set of run lengths summed over 
all sets of run lengths satisfying (9) and (10). The coefficient of p n ° 
therefore gives the desired permutations. This coefficient may be 
found by means of the identity 

v Ta 

(p + p 2 + p 3 + • • *F a =- 

^ -r v t p t (1 _ p y a 

= „r« y* ( r « - 1 + j)V 
fa Mr a -1)\ 

The coefficient of p na in (11) is the same as the coefficient of p n ° on the 
left side of this identity and hence is the same as the coefficient of p n * 
on the right side. The latter coefficient is given by the term for which 
j — n a — r a and is 

(n a - 1)! 

(fa - l)l(Wa ~ r a )l 

This expression gives the desired number of permutations of the a y s 
for r a and n a held fixed. By analogy the number of permutations of 
the b’s for and nj> held fixed is 


(n h - 1)! 


(n — 1) 1(^6 - n)\ 
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If a sequence of a’s and b’s begins and ends with the same letter, 
r a and r b will differ by 1. If the sequence begins with one letter and 
ends with the other, r a and r b will be equal. In the latter case, a given 
arrangement of the a’s can be fitted together with a given arrangement 
of the b’s in two w T ays by beginning with either an a or a b. If k denotes 
the number of ways in which the a’s and b’s can be fitted together, the 
number of permutations of the a’s and b’s together will be 

(n a - 1)! (n b - 1)! 

(r a - 1 )!(n« - r a )\ ( r h - 1 )\(n b - r h )\ 

If this result is combined with that in (7), the following theorem results. 

Theorem II. If the various permutations of n a letters a and n b letters b 
have the same probability of occurring and if r a and r b denote the total 
number of runs of a’s and b’s respectively , then the joint distribution 
function of r a and r b is given by 

(n a — l)!(n& — 1 )\n a \n h \ 

P(r a , r b ) = k - 

(r« - l)!(n a - r a )\(r h - l)J(n 6 - r b )\n\ 

where Jc = 2 if r a = r b and k — 1 if r a ^ r b . 

Although the a’s and b’s were introduced by means of the relation¬ 
ship of sample values to their median, the derivation of this theorem 
does not place any restriction on what value of a variable should be 
selected for assigning letters to the sample values. 

It will be observed that this theorem is not concerned with the form 
of any basic variable distribution function; consequently any test 
derived directly from this theorem will be a non-parametric test. 

One of the interesting and useful applications of this theorem arises 
in connection with finding the probability of obtaining u = r a + r b 
total runs of a’s and b’s when n a and n b arc fixed. In order to find this 
probability, which will be denoted by P(a), it is necessary to sum 
P(r a , r b ) over all values of r a and r b that give rise to this value of u. 
If u is even, r a = r b = u/ 2; consequently there is but one pair of values 
to be considered. If u is odd, r a = (u db l)/2 and r b = (u ^ l)/2; 
consequently there are but two pairs of values to be considered. It 
therefore follows from Theorem II that 
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if u is even, and 

1 





if u is odd. 

u' 

These probabilities have been used to construct tables of ^]P(u) 

u»2 

for various values of n a , n b) and u'. Such tables enable one to test 
whether a sample value of u is unusually large or small as compared to 
what w r ould be expected if the sequence of values constituted a random 
sequence. In order to illustrate the use of such tables, a few entries 
have been extracted from one of them and have been recorded in 
Table 2. In this table u 0 os and u 0 95 are the largest and smallest 
integers, respectively, such that P[u ^ 05] — 0-05 and P[u < ^0.95] 

^ 0.95. These values may therefore be used as 5% critical values for 
testing whether u is unusually small or unusually large, respectively. 
Table 2 requires that n a = n b \ however, if the median of a set of 


TABLE 2 


n a — n 6 

5 

10 

15 

20 

25 

30 

40 

50 

60 

70 

80 

90 

100 

WO 05 

3 

6 

11 | 

15 

19 

24 

33 

42 

51 

60 

70 

! 

79 

88 

Wo 05 

8 

15 

20 

26 

32 

37 

48 

59 

70 

81 

91 

102 

113 

Wo 025 

2 

6 

: 10 

H 

18 

22 

31 

40 

49 

58 

68 

! 

77 

86 

Wo 976 

9 

15 

21 

27 

33 

39 

50 

61 

72 

83 

; 93 

104 

115 


Pin) = 


(n a — 1 )\(n b — l)!n a !ttb! 




observations is chosen for assigning letters, and if the sample is fairly 
large, this requirement will usually be approximately satisfied. 

As a numerical illustration, consider the data introduced at the 
beginning of the section on runs. There u = G, n a = 6, and n b — 6. 
It is clear from interpolating in Table 2 that the value of u is not 
significant; consequently there is no reason for doubting the random¬ 
ness of the sequence as far as this test is concerned. 
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As a second illustration, consider the data of problem 24, Chapter 
III. There are reasons for believing that the expected percentage may 
be shifting here from time to time; hence consider testing the hypothesis 
of randomness against the alternative of too many long runs. If these 
percentages are assigned letters on the basis of lying below or above 
2.55, it will be found that the resulting sequence of a 7 s and b’s is 

dy Cly dy dy by dy dy dy dy by dy by by by dy by by by by 
by by by dy by dy dy dy dy dy dy dy dy by by dy by by 6 

Here n a = 20, rib = 18, and u = 14. Interpolation in Table 2 for 
n a = rib = 19 would give Uq 0 s = 14. Since this result is significant at 
the 5% level for a one-sided test, the hypothesis of randomness would 
be rejected. There appear to be too few runs because of too many 
very long runs; consequently an investigation of the long runs should 
be made. 

Several other tests for randomness are based on functions of runs. 
One such test, for example, is based upon the probability of obtaining 
at least one run of a length greater than a specified length. Such a 
test might be helpful in the problem just considered, since a run of 
length 8 for such a short sequence seems unlikely. 

2. Serial Correlation 

Another approach to testing the randomness of a sequence of observa¬ 
tions can be made by means of correlation methods. If a set of observa¬ 
tions is ordered with respect to time and if time is irrelevant to the 
variable being considered, no correlation would be expected to exist, 
for example, between successive pairs of values of the sequence. If 
the distribution function of a correlation coefficient of this type could 
be found, it would be possible to test the hypothesis that the population 
correlation was zero and thus test the sequence for randomness. The 
derivation of such distribution functions is complicated; consequently 
only the results of one such derivation will be described here. 

If x\, X 2 y * • •, x n denotes the sequence to be tested, it is clear that 
the cross-product term 

77 

(13) R = ]>>, +1 

T^i 

where £ n+ i = x\y differs considerably from the cross-product term 

_ n 

in the ordinary correlation coefficient. The x % and Xi+\ no 
longer constitute a set of random sample pairs of values of two varia- 
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bles; consequently it is hardly to be expected that the correlation 
coefficient of successive pairs, which is called the serial correlation 
coefficient with lag 1, will possess the same distribution function as the 
ordinary correlation coefficient. It is therefore not possible to use the 
significance test for r explained in Chapter V to test serial correlations. 
Useful approximations to the distribution function of the serial correla¬ 
tion coefficient, however, are available for two different situations. 
Tables of critical values have been worked out under the assumption 
that the variable x is normally distributed. Formulas have also been 
worked out which are satisfactory for large samples but which require 
no assumption as to the form of fix). Since the latter approach is 
non-parametric, it alone will be discussed here. 

If all possible permutations of the sequence being considered are 
treated as equally likely to occur, it is theoretically possible to deter¬ 
mine the probability that R in (13) will assume any given value by 
calculating R for all possible permutations and then counting the num¬ 
ber of such permutations that produce the given value. Such calcula¬ 
tions become exceedingly lengthy except for small values of n. For 
large values of n, it turns out that R possesses an approximate normal 
distribution which may be used to test the hypothesis of zero serial 
correlation. For such a test, only the mean and variance of R are 
necessary. These values are given by the formulas 


E(R) = 


$1 2 ~ $2 
n — 1 


and 


<tr 


2 _ 


*S 2 2 - S 4 SS - 4S X 2 S 2 + 45^3 + S 2 2 - 2 S 4 


71—1 


+ 


(■n — 1 ) in — 2 ) 


E 2 (R) 


where 

Sk = ^l A + X‘2 k + * * * + X n k 


It can be shown that a test based upon R is equivalent to a test 
based upon the serial correlation coefficient with lag 1; consequently 
the test based upon R is selected because of its simpler form. 
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EXERCISES 

1. Compare the P[| x — m | >2 <r\ given by Tchebycheffs inequality with the 
actual value for (a) the rectangular distribution f(x) = 1 , 0 ^ x ^ 1, (6) f{x) = 
«*“*, 0. 
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2 . Derive TchebychefFs inequality for a discrete variable. 

8 . Use TchebychefFs inequality to show that the P[(£ — m) 2 > c] —» 0 as 
»—> oo. 

4. For f(x) cx 2 e~ x y x ^ 0 , determine the P[\ x — m \ > 2 <r]. Compare this 
value with the value given by TchebychefFs inequality. 

6 . For a sample of 200, determine what the probability is that at least 99% of 
the population will be included between the extreme values of the sample. 

6 . Determine how large a sample is necessary in order that the probability 
will be 0.80 that at least 99% of the population will be expected to lie between the 
extreme values of the sample. 

7. Derive formulas for the mean and variance of z where z is the proportion of 
the population lying between the extreme values of the sample. 

8 . From the variance of z obtained in the preceding problem, would you expect 
the estimated percentage of the population lying between the extreme values of a 
sample of 100 to be an accurate estimate of the actual percentage? 

9 . Following the methods that were used to derive Theorem I, derive the dis¬ 
tribution function of the proportion of the population which lies between the rth 
smallest and the rth largest values in a random sample of size n. 

10 . Toss a coin 30 times, recording the sequence of heads and tails, and then 
test for randomness by means of Table 2. 

11 . Write down a sequence of a *s and b’s totaling 50 letters which you feel is 
random. Test this sequence for randomness by means of Table 2. 

12 . The defective parts from a day's production of a certain machine were re¬ 
corded as a if too narrow, and as b if too wide, with the following results: a, a, a, 
6 , 5, a, a, b, b, b, a, b, b, b, b, b, b, b, b. By means of the more complete tables for 
total runs found on page 70 of the March 1943 issue of Annals of Mathematical 
Statistics , determine whether the machine seems to be shifting slightly towards 
parts that an? too wide. 

13 . Test the following set of measurements for a trend by means of serial corre¬ 
lation, applying the formulas to the measurements after reduction by a convenient 
common difference: 28, 32, 37, 25, 31, 29, 33, 28, 27, 28, 23, 22, 18, 17. 
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TESTING GOODNESS OF FIT 

THE x 2 DISTRIBUTION 

1. Nature of X 2 

A problem that arises frequently in statistical work is the testing 
of the compatibility of a set of observed and theoretical frequencies. 
If it is reasonable on the basis of a test to assume that a set of observed 
frequencies might have been obtained in random sampling from a 
population that has a specified set of corresponding theoretical fre¬ 
quencies, the two sets of frequencies will be said to be compatible as 
far as that test is concerned. This type of problem was solved in 
Chapter III for the case in which the set consisted of only one pair of 
observed and theoretical frequencies and in which the binomial distribu¬ 
tion was used to obtain the theoretical frequencies. A generalization 
of the binomial problem to k pairs of observed and theoretical fre¬ 
quencies will give rise to the multinomial distribution of Chapter III, 
and the solution of the problem will require a simple approximation 
to the multinomial distribution similar to the normal approximation 
to the binomial distribution. It turns out that the familiar x 2 dis¬ 
tribution function serves as an excellent approximation to the multi¬ 
nomial for large samples and therefore that tests of compatibility can 
be based upon it. 

As a simple illustration of the type of problem being discussed, con¬ 
sider the problem of testing the “honesty” of a die. Suppose that a die 
is rolled 60 times and a record is kept of the number of times each face 
comes up. The statement that a die is honest means that each face 
has the probability ^ of appearing in a single roll. Therefore each 
face would be expected to show 10 times in an experiment of this kind. 
Suppose that the first of a contemplated sequence of experiments 
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produced the following result, where the rows labeled o and e represent 
the observed and theoretical frequencies respectively. 


Face 1 

2 

3 

4 

5 6 

o 6 

15 

7 

4 

17 11 

e 10 

10 

10 

10 

10 10 


Now, as a measure of the compatibility of such observed and expected 
frequencies, it is customary to calculate the statistic called x 2 > which is 
defined by 


( 1 ) 


2 Oi - ti ) 2 

x = ^ — 


where k is the number of pairs of frequencies to be compared, o* and 
d denote these frequencies, and So* = Se* = n. In this problem k = 6 
and 


(6 - 10) 2 ( (15 - 10) 2 ( (7 - 10) 2 ( (4 - 10) 2 
lo 1 10 - " l lo " 10 



A value of zero here would correspond to exact agreement with expecta¬ 
tion, whereas increasingly large values of x 2 may be thought of as 
corresponding to increasingly poor agreement. Now if this experiment 
were repeated a large number of times and each time the value of x 2 
were computed, a set of x 2, s would be obtained which could be classified 
into a relative frequency table of x 2, s. This relative frequency table 
would tell one approximately in what percentage of such experiments 
various ranges of values of x 2 could be expected to be obtained. Then 
one would be able to judge whether the value of x 2 = 13.6 was un¬ 
usually large as compared to the run of x 2, s in such experiments. 

If the relative frequency table of x 2, s were graphed as a histogram, 
the histogram would serve as an approximation to the true frequency 
distribution of x 2 for an unlimited number of such experiments. Now 
the true frequency distribution of x 2 is discrete, and hence it would be 
represented by a histogram also because there are only a finite number 
of values that x 2 can assume, since all the o* are integers ranging fromO 
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to 60. For example, the smallest positive yalue that x 2 can assume in this 
problem is 0.2, which is obtained when two of the observed frequencies 
are 9 and 11 and the remaining frequencies are 10 each. However, in 
most problems of this type, the number of possible values of x 2 is 
large; consequently the histogram representing the true x 2 distribution 
is usually quite regular and can therefore be approximated by means 
of a curve. 

The preceding discussion has been concerned with how one could 
proceed empirically to find an approximation to the distribution func¬ 
tion of x 2 * However, by using the multinomial distribution function 
and making certain approximations, it is possible to obtain an approxi¬ 
mation to the distribution function of x 2 by theoretical methods. Since 
the derivation of this approximation is not simple, it will not be con¬ 
sidered here; however, such a derivation shows that for large samples 
an excellent approximation to the distribution function of x 2 is given by 


k-3 x 2 



If this function is compared with that given by (8), Chapter VIII, 
it will be observed that (2) is what was previously called the x 2 distribu¬ 
tion function with k — 1 degrees of freedom. As a matter of fact, the 
name x 2 was given to the function (8), Chapter VIII, because of its 
identity with (2). 

Now consider the application of (2) to the particular problem being 
discussed. Since large values of x 2 correspond to one’s notion of poor 
experimental results, it is customary to select a value xo 2 such that 
P[x 2 > Xo 2 ] = 0*05 as a critical value for judging significance at the 
5% level. Since /(x 2 ) depends only on the parameter v , called the 
number of degrees of freedom, there will correspond a x 2 curve and a 
5% critical value to each value of v. The graph of /(x 2 ) corresponding 
to various degrees of freedom is given in Fig. 4, Chapter VIII. Since 
(2) is/(x 2 ) with v = k — 1 and k = 6 in this problem, the graph corre¬ 
sponding to v = 5 shows the distribution of x 2 ’s to be expected in this 
problem. From Table III it will be found that xo 2 = 11.1 for 5 degrees 
of freedom; hence the value of x 2 = 13.6 is significant, and the honesty 
of the die is highly questionable. If the die were honest, and repeated 
experiments of rolling the die 60 times were conducted, then in less 
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than 5% of such experiments would a value of x 2 be obtained which ' 
exceeded this first experimental result. 

2. Generality of X 2 

Since f(x 2 ) depends only upon the parameter v y which is always 
available in any given problem, the x 2 test is a non-parametric test, 
although it is often applied to parametric problems. The x 2 distribu¬ 
tion is concerned with the values of the e x but not with the form of the 
distribution function from which they might have been obtained as 
samples This property of the x 2 distribution permits it to be used on 
a wide variety of problems involving a comparison of observed and 
theoretical frequencies. 

A second feature of the x 2 distribution which makes the x 2 test one 
of wide applicability arises from the theory which demonstrates that 
X 2 as given by (1) possesses approximately the distribution function 
given by (8), Chapter VIII, with v = k — 1 . In that theory it is 
shown that, for each independent linear restriction imposed upon the 
observations, o x , the value of the parameter v is decreased by unity; 
otherwise the function is unchanged. 

In the die problem, it was assumed that in each experiment the die 

was rolled 60 times; consequently the o x always satisfied the single 
6 

linear restriction = 60. In the light of the second feature of the 

l 

X 2 distribution just discussed, this restriction explains why v was equal 
to 5 in the preceding problem. For most problems the determination 
of the value of v can be made intuitively by counting the number of 
independent cell frequencies. Since these 6 cell frequencies must total 
60, there are only 5 independent cell frequencies, which is therefore the 
number of the degrees of freedom here. This connection between the 
physical idea of degrees of freedom and the parameter v in/(x 2 ) explains 
why that parameter is called the, number of degrees of freedom. 

3. Applications 

In experiments on the breeding of flowers of a certain species, an 
experimenter obtained 120 magenta flowers with a green stigma, 48 
magenta flowers with a red stigma, 36 red flowers with a green stigma, 
and 13 red flowers with a red stigma. Theory predicts that flowers of 
these types should be obtained in the ratio of 9:3:3:1. Are these 
experimental results compatible with the theory? On the basis of this 
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theory, the observed and expected frequencies, correct to the nearest 
integer, are given by 


0 

\ 

120 

48 

36 

13 

e 

122 

41 

41 

14 


Calculations give 


.2 = 


(120 - 122) 2 (48 - 41) 2 (36 - 41) 2 (13 - 14) 


122 


+ 


41 


+ 


41 


+ 


14 


= 1.9 


From Table III the 5% critical value of x 2 for 3 degrees of freedom 
is xo 2 = 7.8; consequently the result is not significant. There is no 
reason on the basis of this test for doubting the theory here. 

As a second application, consider the following data on the number 
of aircraft accidents that occurred during the various days of the week 
for a given period of time at certain training bases. First, consider 


Sunday 

Monday 

Tuesday 

Wednesday 

Thursday 

Friday 

Saturday 

14 

16 

8 

12 

11 

9 

14 


the question whether accidents are uniformly distributed over the week. 
Since the total number of accidents under consideration is 84, the 
expected frequencies are 12 each. Computations with e % — 12 give 
X 2 — 4.2. Since xo 2 — 12.6 for 6 degrees of freedom, the observed 
frequencies are compatible with those based on the assumption of 
homogeneity. Second, consider the question whether social activities 
of the week end affect the accident rate. If Saturday, Sunday, and 
Monday are treated as the days that would be so affected, the problem 
reduces to a comparison of frequencies after the data have been com¬ 
bined into the two groups 


Week 

Remaining 

end 

days 

44 

40 
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On the basis of a uniform distribution over the week, the expected 
frequencies are now 36 and 48 respectively. Computations then give 
X 2 = 3.1. Since this value is so near the critical value of 3.8 for 1 
degree of freedom, one would prefer to suspend judgment until further 
data were made available. 

4. Limitations 

Since the x 2 curve is only an approximation to the true distribution 
for problems of this type, care must be exercised so that the x 2 test 
will be used only when this approximation is good. Experience and 
theoretical investigations indicate that the approximation is usually 
satisfactory provided that the e* ^ 5 and k ^ 5. If k < 5, it is best 
to have the somewhat larger than 5. 

If some of the cell frequencies, e ty do not exceed 5, they may possibly 
be combined with other cell frequencies until the condition is satisfied. 
In the illustration on aircraft accidents, such a combination was made 
in the second part of the problem for the purpose of testing a second 
hypothesis, but it could equally well have been made of necessity if the 
cell frequencies had been too small to satisfy the above conditions. 
In any such reduction, of course, it is necessary to calculate the value 
of x 2 and to determine the number of degrees of freedom after the 
^eduction. 

Methods are available for some of the other applications about to be 
presented which permit the x 2 test to be applied with slightly greater 
confidence when the above conditions are barely satisfied. 


CONTINGENCY TABLES 

A slightly more complicated problem arises in testing the compati¬ 
bility of sets of observed and expected frequencies when the frequencies 
occur in a two-way table rather than in a one-w T ay table like those con¬ 
sidered thus far. Such two-way tables are often called contingency 
tables. As an illustration of a contingency table, consider Table 1 
in which are recorded the frequencies, corresponding to the indicated 
classifications, from a sample of 400 individuals. 

A contingency table is usually constructed for the purpose of study¬ 
ing the relationship between the two variables of classification. If 
the variables are unrelated, that fact can be tested by means of an 
adaptation of the x 2 test. The problem here is to test whether there 
is any relationship between an individual's education and his adjust¬ 
ment to marriage; therefore set up the hypothesis that there is no 
relationship. 
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TABLE 1 

Marriage-Adjustment Score 




Very low 

Low 

High 

Very high 

Totals 

a 

o 

College 

18 (27) 

29 (39) 

70 (64) 

115 (102) 

232 

'■§ 

o 

S3 

High school 

17 (13) 

28 (19) 

30 (32) 

41 (51) 

116 

s 

Grades only 

11 (6) 

10 (9) 

11 (14) 

20 (23) 

52 


Totals 

46 

67 

111 

176 

400 


Now consider repeated sampling experiments of the type from which 
these data w^ere obtained. Each experiment consists in selecting 400 
people at random from the population in question and classifying them 
with respect to their education and marriage-adjustment score. Out 
of all such experiments, consider only those which give the same row 
and column totals as for this first experiment for which the frequencies 
are given in Table 1. For such experiments the percentage of indi¬ 
viduals with some college education, for example, will remain constant. 
Now, if there is no relationship between the two variables, the per¬ 
centage of individuals with some college education should be the same 
for all four of the categories of marriage adjustment. Since there are 
232 college people out of the 400 sampled and the totals do not change 
from experiment to experiment, the percentage of college people to be 
expected in each of the four categories is 58%, correct to the nearest 
per cent. Since the column totals do not change, this percentage of 
the column totals will yield the number of college people to be expected 
in each of the four columns. The resulting expected frequencies, 27, 
39, 64, and 102, correct to the nearest integer, have been inserted in 
parentheses in Table 1 next to their corresponding observed frequencies. 
The expected frequencies for the other two rows were obtained in a 
similar manner from the marginal totals. A check on the calculations 
may be obtained by verifying the row and column totals. It should 
be clearly understood that, unless the marginal totals are held fixed, 
the frequencies in parentheses would vary from experiment to experi¬ 
ment and therefore could not be treated as expected frequencies. By 
considering only the experiments that yield the same marginal totals, 
the problem is reduced to testing the compatibility of a set of observed 
and expected frequencies. By restricting the class of experiments for 
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consideration in this manner, an experiment is judged by comparison 
with experiments very similar to it with respect to certain unimportant 
characteristics rather than by comparison with a wider class of experi¬ 
ments. The characteristic of possessing the same marginal totals is 
unimportant because it does not influence the relationship of the two 
variables. 

In order to determine the number of degrees of freedom in this 
problem, it is necessary to know how many independent linear restric¬ 
tions are imposed because of the marginal restrictions. If the observed 
frequency in the ith row and the ji bh column is denoted by o lJf the 
restriction that the row and column totals shall remain fixed can be 
expressed as 

4 4 4 

~ 232, ^ 2j — 116, = 52 

Tl 1 

3 3 3 3 

'y'pi i = 46, 2 = 67, = m, y~%4 = 

i tl i* ir 

It would appear from this that there are 7 independent linear restric¬ 
tions here; however, the sum of the first 3 of these sums equals the 

sum of the remaining 4 sums because they both equal the total sample 

size. Thus, there are but 6 independent linear restrictions and hence 
the number of degrees of freedom here is 12 — G = 6. This result 
can be obtained very easily by counting independent cell frequencies. 
Since the frequencies in the first row must total 232, only 3 of the 4 
cell frequencies are free to vary. Similarly for the second row. The 
third-row frequencies, however, are now completely determined because 
the column totals are fixed; consequently there are only 6 independent 
cell frequencies and hence 6 degrees of freedom. In general, for a con¬ 
tingency table of a rows and b columns, there will be (a — 1)(6 — 1) 
degrees of freedom. 

The value of x 2 for Table 1 will be found to be 20.7. Since xo 2 = 12.6 
for 6 degrees of freedom, this result is highly significant. An inspection 
of the table shoAvs that individuals with some college education appear 
to adjust themselves to marriage more readily than those with less 
education. 


FREQUENCY CURVE FITTING 

If a theoretical frequency distribution has been fitted to an observed 
frequency distribution, the question naturally arises whether the fit 
is satisfactory. This question arose, for example, in the exercise on 
fitting a normal curve to a histogram in Chapter III. When a normal 
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curve is fitted to a histogram, it is usually assumed that the data 
represent a sample selected at random from a normal population and 
that the fitted normal curve is an approximation to the population 
curve. From this point of view, the question whether a fit is satis¬ 
factory can be answered only if one knows what sort of histograms will 
be obtained in random samples from a normal population. 

Now the x 2 test can be employed to give a partial answer to this 
question. Since the x 2 test is concerned only with comparing sets of 
observed and expected frequencies, it is capable of testing only those 
features of the fitted distribution that are affected by a lack of com¬ 
patibility in the compared sets. For example, the x 2 test is not capable 
of differentiating between a normal curve with a given set of expected 
frequencies and any other curve with the same expected frequencies. 
With this understanding, consider the problem of testing the adequacy 
of the normal curve fit found in Chapter III by means of the last two 
columns of Table 1 of that chapter. 

If the theoretical frequencies of this table are treated as the expected 
frequencies, it will be found that x 2 = 8.4. There are 10 pairs of 
frequencies to be compared in this problem, but there are not 9 degrees 
of freedom here as one might expect by analogy with the previous 
applications of the test. In this problem the expected frequencies 
would change from experiment to experiment because the fitted normal 
curve uses the mean and variance of the data rather than their popula¬ 
tion values, which are unknown. However, if repeated samples of 
size n are drawn from the same normal population and only those results 
are retained that give rise to the same mean and variance as for this 
one experiment, then the expected frequencies will not change. But 
now there will be 3 independent linear restrictions on the o x because 




= ns 


2 


The first equality requires that the sample size be n, the second that 
the mean be x , and the third that the variance be s 2 ; consequently the 
number of degrees of freedom would be v = k — 3. In this particular 
problem v — 10 — 3 = 7 and xo 2 — 14.1. Since x 2 = 8.4, there is no 
reason for doubting that the data might have been obtained from sam¬ 
pling a normal population, as far as compatibility of corresponding 
sets of frequencies is concerned, and so the fit in Fig. 4, Chapter III, 
would be considered satisfactory from this point of view. 

The situation that was met in this problem is the common one in 
fitting theoretical frequency distributions to observed distributions 
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because the parameters specifying the theoretical distribution are 
seldom known and must be estimated from the data. In most problems 
of this type, these estimates are determined by means of moments, 
which give rise to linear restrictions; consequently the x 2 test may be 
applied provided that 1 degree of freedom is deducted for each parame¬ 
ter that is replaced by its sample estimate. 

If a binomial distribution were fitted to an observed frequency dis¬ 
tribution by determining the 3 independent parameters in N(q + p) n y 
where N is the sample size, from the observations, the number of 
degrees of freedom in the x 2 test would be k — 3 just as in normal curve 
fitting; however, it often happens in binomial problems that one or 
more of the parameters will be specified from other considerations. 
For example, suppose that one were interested in studying the sex 
distribution in families of 8 children each. Here the value of n = 8 
does not place any restriction on the o t ; consequently the number of 
degrees of freedom would be k — 2. If one were to assume equal sex 
distribution rather than determine p from the observations, the number 
of degrees of freedom would be k — 1. 

Since the fitting of a Poisson distribution involves only two parame¬ 
ters, the x 2 test will involve k — 2 or k — 1 degrees of freedom, depend¬ 
ing upon whether m is replaced by its sample value or is known from 
other considerations. The illustrative example discussed in Chapter III 
in the section on the Poisson distribution is an example in which k — 2 
is the correct number of degrees of freedom. 

The preceding discussion in which only those sampling experiments 
that give rise to the same sample mean and variance as for the first 
experiment are considered is an attempt to describe what corresponds 
to the theory of the x 2 distribution in this case. A theoretical discussion 
would involve probability densities and conditional distributions like 
those of Chapter VI. The possibility of obtaining sampling experi¬ 
ments with precisely the same mean and variance as for the first experi¬ 
ment therefore need not give one concern since this approach is merely 
a method of describing sampling from a conditional distribution. 


INDICES OF DISPERSION 

It frequently happens that an experimenter has a set of data that 
he believes can be treated as having come from a binomial distribution, 
or a Poisson distribution, but which contains so few values that it is 
useless to attempt to fit a binomial, or Poisson, distribution to the 
observed distribution. In such situations it is customary to test the 
hypothesis that the data came from a distribution of the assumed type 
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by testing the variability of the data. Since these tests can be treated 
as special cases of the x 2 test, they will be introduced from that point 
of view. 

Let Zi, x 2 , • * *, zjt represent the number of successes in n trials each 
in taking k samples from a binomial distribution with probability p. 
These numbers may be treated as the observed frequencies in the first 
row of the following two-rowed contingency table. 


Xl 

X2 

... 

Xk 

n — x\_ 

n — X 2 

1 

n — Xu 


If this table is treated as though it were an ordinary contingency table 
like the contingency table in the second section preceding this one, the 
expected frequencies of the cells in the first row will become 


e x — - n = x 

nk 


(i = 1,2, •••,*) 


and therefore those of the second row will become n — x. As a result, 
the value of x 2 will reduce to 



The contingency table upon which this result is based differs slightly 
from the ordinary contingency table treated previously. For the 
ordinary table, successive observations were free to fall in any one 
of the cells. Then out of such experimental results only those experi¬ 
ments were considered that possessed the fixed marginal totals of the 
first experiment. Among such restricted experiments, the x 2 distribu¬ 
tion is known to represent approximately the relative frequency of 
possible values of x 2 * For this binomial contingency table, however, 
successive observations are not free to fall in any one of the cells. The 
first n observations must fall in one of the two cells of the first column, 
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the second n observations in one of the two cells of the second column, 
etc. It is therefore necessary to consider only those sampling experi¬ 
ments that produce the same order of experimental results in addition 
to the same marginal totals. Since the sampling is random, this 
ordering does not affect the relative distribution of values in the various 
cells; consequently the ordinary contingency table methods for apply¬ 
ing x 2 are applicable. 

The expression (4) is called the binomial index of dispersion. It is 
used to test the hypothesis that the k sample frequencies, came from 
the same binomial population. Since the sample size is fixed, the 
binomial index actually tests the hypothesis that the probability of a 
success is the same for all k samples. This is but one of many such 
tests that could be designed to check on the experimenter’s belief that 
the value of p remains unchanged from sample to sample. It is clear 
from the nature of x 2 that the binomial index will exceed its critical 
value only if there is excessive variability in the observed frequencies. 

As an illustration of the application of (4), consider the following 
data on the number of infected plants per plot for 12 plots and 90 
plants in each plot: 19, 6, 9, 18, 15, 13, 14, 15, 16, 20, 22, 14. The prob¬ 
lem here is to determine whether it is reasonable to assume that the 
infection is distributed at random over the 12 plots. Calculations give, 
to the indicated accuracy, 

12 

x = 15.1, - x) 2 = 223, X 2 = 18 

1 

For 11 degrees of freedom, xo 2 = 19.7; consequently there is some 
doubt whether the infection is distributed at random. For data of 
this type, it sometimes happens that the infection is localized and 
gradually spreads from the center of concentration; however, the num¬ 
ber of plots here is not sufficiently large to investigate such possibilities. 

If the value of p is very small and the value of n is very large, the 
value of x/n, which is the sample estimate of p , will be very small; 
consequently the value of 1 — x/n will be very nearly 1. If this 
approximation is used in (4), the binomial index of dispersion reduces 
to what is known as the Poisson index of dispersion , namely, 


x 

It would appear that the Poisson index is merely a special case of the 
X 2 test of compatibility given earlier for those situations in which the 
expected frequencies are equal; however, there is a distinction in the 




198 


TESTING GOODNESS OF FIT 


nature of the variables. The sum of the frequencies in the ordinary 
X 2 test represents the total number of observations made, whereas in 
applications of the Poisson index there are but k observations, each 
observation yielding a result that happens to be a frequency number. 
It is important to distinguish between these two types of problems 
in order to avoid the mistake of applying the ordinary x 2 test to the 
first row only of the binomial frequencies in (3). Such an application 
would be equivalent to assuming that the data came from a Poisson 
rather than a binomial population. The value of x 2 would then tend 
to be too small. 

As an illustration of the Poisson index, consider the problem of test¬ 
ing whether the following data on the number of defective parts found 
in samples of 1,000 parts each are homogeneous: 15, 13, 8, 6, 11, 9, 14, 
10, 16, 9, 12. Since the probability of a part’s being defective is very 
small and n is very large, these frequencies will be treated as having 
come from a Poisson population. Calculations give 

(x — x) 2 

x « 11.2, 2(x; - x) 2 = 97.6, 2 — - -- = 8.7 

x 

For 10 degrees of freedom, xo 2 = 18.3; consequently the result is not 
significant. 
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Investigations have shown that the x 2 test must be applied with discretion when 
any of the ei are small, particularly for the binomial index of dispersion and when 
fitting theoretical frequency distributions to observed distributions. An interest¬ 
ing example to illustrate the errors that may arise in fitting theoretical distribu¬ 
tions is given in: 

Gumbei,, E. J., “On the Reliability of the Classical Chi-Square Test,” Annals of 
Mathematical Statistics , vol. xiv (1943), pp. 253-263. 

In this article, the author advocates the use of a transformation to circumvent 
some of the drawbacks of the standard x 2 test in such cases. 


EXERCISES 

1. Toss a coin 100 times, and apfply the x 2 test to see whether the coin is biased. 

2 . Roll a die 66 times, recording the various number of points obtained, and 
apply the x 2 test to see whether the die is symmetrical. 

3 . In a breeding experiment it was expected that ducks would be hatched in 
the ratio of 1 duck with a white bib to each 3 ducks without bibs. Of 86 ducks 
hatched, 17 had white bibs. Is this a reasonable result? 

4 . In an epidemic of infantile paralysis, 927 children contracted the disease. 
Of these*, 408 received no serum, and of these 104 became paralyzed. Of those 
who did receive serum, 166 became paralyzed. Was the serum effective? 

6. Is there any relation between the mentality and weight of criminals as judged 
by the following data? 


Weight 


Mentality 

90-120 

120-130 

130-140 

140-150 

150- 

Normal 

21 

51 

94 

106 

124 

Weak 

15 

18 

34 

15 

15 


6. Show that for a 2 X 2 contingency table with cell frequencies a, 6, c, and d , 
respectively, 

2 (a 4- h 4* c H- d) (ad — 6c) 2 
*' = (a + b)(c+ d)(b~+ d) (a + c) 


7 . Use Tippett’s random-sampling numbers to sample from the population ex¬ 
pressed by 


X 

0 

1 

2 

f 

0.4 

0.4 

0.2 


taking samples of 25 and performing 20 (or more) such sampling experiments. 
Calculate x 2 for each experiment, then classify the 20 (or more) values into a fre- 
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quency table. Compare the resulting histogram with the x 2 curve for v = 2 . 
This class exercise is intended to make the stated x 2 theory seem plausible. 

8 . Apply the x 2 test to the normal curve fit for the following 500 determina¬ 
tions of the width of a spectral band of light. Here e denotes the fitted normal 
curve frequencies. 


0 

5 

12 

43 

61 

105 

103 

89 

| 

54 

1 

19 

7 

2 

e 

5 

14 

36 

71 

102 

109 

85 

50 

21 

7 

2 


9 . Apply the x 2 test for goodness of fit to the results of problem 23, Chapter III. 

10 . Apply the x 2 test for goodness of fit to the results of problem 27, Chapter III. 

11 . The following data give the number of Colonies of bacteria which developed 
on 15 different plates from the same dilution. Is one justified in claiming that the 
dilution technique was satisfactory in the sense that the bacteria behave as though 
they were randomly distributed in the dilution? The number of colonies were: 
193, 168, 161, 153, 183, 152, 171, 156, 159, 140, 151, 152, 133, 164, 157. 

12. On the basis of a given hypothesis, show that if an experiment yields a value 
of x 2 — xi 2 with v degrees of freedom, and if the experiment is repeated with ap¬ 
proximately the same results, then the two experiments combined yield a different 
degree of confidence in the hypothesis from the first experiment alone. 

13 . Find m, o-, < 23 , and for /(x 2 )> and comment on its normal approximation. 

14 . Show that, when v — 1 , the variable x possesses a normal distribution. 

15 . Show that the largt'-sample method of Chapter IV for testing the difference 
of percentages is equivalent to the x 2 test when applied to the 2 X 2 contingency 
table of successes and failures. 
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TESTING STATISTICAL HYPOTHESES 

NATURE OF STATISTICAL HYPOTHESES 

A large part of the material presented in the preceding chapters has 
been concerned with testing various statistical hypotheses. These 
hypotheses were tested by means of distribution functions that were 
derived for such purposes or that seemed to be applicable to the given 
problem. The F distribution, for example, was derived to test the 
hypothesis that two independent normal samples came from popula¬ 
tions with equal variances; however, the F distribution was found to 
be applicable for testing many other hypotheses as well. In all these 
problems the particular distribution function used and the particular 
critical region selected were based on intuitive arguments rather than 
upon any logical principle. For example, the 5% critical region for 
the simple x 2 test of the preceding chapter was selected as the 5 % 
right-hand tail of the x 2 curve. This choice was based on the reasoning 
that, the larger the value of x 2 in a sampling experiment, the less faith 
an experimenter would have in the truth of the hypothesis of compati¬ 
bility between observed and expected frequencies. Although such 
intuitive arguments often yield highly efficient tests for testing the 
hypothesis in question, some logical principle for selecting the proper 
test is necessary if one is to be certain of the efficiency of a test. Two 
such principles will be considered very briefly in this chapter. 

A consideration of the various hypotheses that were tested in the 
preceding chapters will show that most of them consisted in the 
specification of, or the equality of, certain parameter values in the 
distribution function representing the population being sampled. For 
example, the first application of Student's t distribution was the testing 
of the hypothesis that the mean of the normal population from which 
the sample was taken had a specified value. The second application 
of this distribution was testing the hypothesis that the means of two 
normal populations having equal variances were equal. In all these 
tests, the normality assumption was not a part of the hypothesis. A 
statistical hypothesis is usually regarded as a statement that specifies 
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the value of one or more, or a relationship between two or more, of the 
parameters that determine the assumed distribution function. 

A test of a statistical hypothesis is a rule for accepting or rejecting 
the hypothesis, this rule consisting in the selection of a critical region 
for the statistic being used and agreeing to reject the hypothesis if 
and only if the sample gives a value of this statistic that falls in the 
critical region. For example, in testing the hypothesis that the mean 
of a normal population has a specified value, as was done in the example 
discussed in the derivation of Students t distribution, the critical 
region consisted of that part of the t axis lying to the right of the point 
to.io- In similar problems in which there is no reason for preferring 
one tail of Students t distribution, the critical region might consist 
of the two regions t > to .05 and t < ~f 0 . 05 - It will be recalled that 
to. 05 is a value of t such that P[ | 1 1 > fo.os] = 0.05. 


TWO TYPES OF ERROR 

In following a procedure for testing hypotheses, there are two possi¬ 
bilities for error. If the hypothesis is true but the test rejects the 
hypothesis, an error known as a type I error is made. On the other 
hand, if the hypothesis is false but the test accepts the hypothesis, an 
error known as a type II error is made. The relative importance of 
these two kinds of errors depends upon what action is to be taken as a 
result of the test. 

As an illustration, suppose that an innocent man is being tried for a 
crime and that his sentence hinges on the result of a certain experiment. 
If a hypothesis corresponding to innocence was set up and was rejected 
by the experiment, then an innocent man would be convicted and a 
type I error would result. On the other hand, if the man were guilty 
but the experimental result accepted the hypothesis corresponding to 
innocence, then a guilty man would be freed and a type II error would 
result. Here a type I error would be considered by society as more 
serious than a type II error. 

As another illustration, suppose that a new industrial process which 
is superior to the standard process is being tested by means of an experi¬ 
ment. If the hypothesis of no improvement was set up and was 
accepted, then a valuable improvement would be lost. This would 
usually be more serious than making the mistake of advocating a new 
process that in reality is no improvement. Here a type II error would 
be more serious than a type I error. The relative importance of these 
two types of error is usually not as simple a matter as the above discus¬ 
sion might indicate. For example, suppose that employees are being 
screened by means of a test. The test may be quite good at selecting 



TWO TYPES OF ERROR 


203 


the potentially undesirable employee provided that its standards are 
sufficiently high; however, these standards may also eliminate a fairly 
large percentage of potentially desirable employees. If labor is scarce, 
the problem of deciding on the most economical score to choose for 
screening becomes quite complex. 

A logical procedure for selecting efficient tests of statistical hypothe¬ 
ses can be designed by means of these two types of error. The proce¬ 
dure consists in first specifying that all the tests under consideration 
shall have the same-size type I error and then selecting that test for 
which the type II error is a minimum. Since the size of the type I 
error is the probability that the sample will yield a value falling in the 
critical region, the size of this error can be regulated by changing the 
size of the critical region. Thus, all tests under consideration can be 
made to have the same-size type I error, say a, by choosing all critical 
regions to be of size a. The problem is then reduced to determining 
which test, if any, minimizes the type II error. For most of the tests 
that were employed in the preceding chapters, the solution of this 
problem is not simple. 

As an illustration of this procedure for designing an efficient test, 
consider the problem of testing the hypothesis that the mean of a 
certain normal population with unit variance has the value mo. The 
procedure employed in Chapter IV for testing this hypothesis on the 
basis of a sample of size n was to calculate the quantity 

(1) u = (x — m 0 )\/n 

which would be normally distributed with zero mean and unit variance 
provided the hypothesis was true, and accept the hypothesis if and only 
if u satisfied the inequality 

— 1.96 < u < 1.96 


In order to determine whether this is an efficient test, it is necessary 
to compare it with other tests. For the sake of simplicity, consider 
only those tests that agree to accept the hypothesis if and only if u 
satisfies the inequality 

ti < u < t 2 


where t\ and t 2 are two numbers such that 


( 2 ) 



<2 

2 dt = 0.95 


The restriction (2) guarantees that all these tests will have a type I 
error of size 0.05. 
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Now the problem is to determine the values of t\ and t 2 that will 
minimize the type II error. If the hypothesis is false with the mean 
of the population equal to m ^ rao, the variable 

(3) v = (x — m)\/n 

will be normally distributed with zero mean and unit variance. If 
(3) is used in (1), u can be written in the form 

u — v + y/n(m — m 0 ) 

consequently the variable u will now be normally distributed with 
mean \/n(m — ra 0 ) and unit variance. The probability that u will 
fall in the interval (t Xy t 2 ) is therefore given by 


(4) 


P = 


1 

VS 





du 


This integral gives the type II error, which it is desired to minimize 
subject to the restriction (2). 

For the purpose of calculating the derivative of P, it is convenient 
to write P in the form 


(5) P = 


VS 




du 


VS 




du 


If h is treated as the independent variable, t 2 will be a function of t x 
because of condition (2). Now the derivative of P with respect to t x 
may be obtained by means of the calculus formula used previously 
and given after (5), Chapter VI. Application of this formula to (5) 
gives 


<*P 1 

(6) — = —7== - 

dh V2t r 


— y/n{m — mo)] 2 ^2 


dh VS 


o -H[ti-\ / n(m-m 0 )] 2 


The integral in (2) may be treated in the same manner to give 


(7) 


1 


-%dt 2 


1 


'i! 

2 


0 


VS dh VS 

If the value of dt 2 /dh from (7) is substituted in (6), (6) will reduce to 


dP 

dh 


1 


VS 


p— K[<2-Vn(w-m 0 )] 2 — — <2 2 ] 


vs' 


Vn(w-mo)] 2 
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This simplifies into 


( 8 ) 


dP 

dti 


^ Hl<i 2 +n(m-m 0 ) 2 ] ( ^2\4(w-m 0 ) _ ^iVn(m-mo) ) 

V~2ir ' 


Since t 2 > t\ y it follows from (8) that dP/dti ^ 0 according as 
m ^ mo. This means that P is an increasing function of t\ for m > m 0 
and a decreasing function for m < m 0 ; consequently the maximum and 
minimum values of P are assumed when ti takes on its extreme values. 
Thus, for m > m 0 the minimum value of P occurs when = — oo 
and its maximum value occurs when t 2 = , since t\ assumes its largest 

value then. For m < the maximum value of P occurs when 
ti = —oo and its minimum value occurs when t 2 = °o. There is 
therefore no pair of numbers, t\ and t 2 , which will minimize the type II 
error for all possible values of m ^ mo. 

If t\ and t 2 are held fixed and P in (4) is treated as a function of m, 
its graph will show how the type II error changes with m. From 
the preceding discussion it follows that the two curves corresponding 
to t\ = —oo and t 2 = 00 will determine a region in the P, m plane 
within which all other test curves will lie. From (4) it will be observed 
that, for tx = —oo, P —» 1 as rn —> —°o, and P —> 0 as m —» oo f whereas, 
for t 2 = oo, P —> 0 as m —> —oo and P —> 1 asm-^oo. These results 
follow readily if one thinks geometrically of a standard normal curve 
which moves off to infinity. The preceding results show that the two 
boundary curves in the P, m plane have the lines P = 0 and P = 1 
as horizontal asymptotes. These two boundary curves are represented 
graphically by a and b in Fig. 1. The curve corresponding to the test 
given in (1) is labeled c. Other pairs of values of t\ and t 2 satisfying 
(2) would yield curves lying between the curves a and b. 

From the preceding discussion and Fig. 1, it is clear that there is 
no best test among the class of tests considered for testing the hypothe¬ 
sis m = m 0 because there is no curve lying below all other curves for 
all values of m ^ m 0 . However, if only values of m < mo were con¬ 
sidered as possible alternatives to m == m 0 , then the curve b would 
correspond to the best possible test because the type II error would 
be less than that for any other test for every value of m < m 0 . Since 
the minimizing curve b corresponds to the value t 2 = oo, from (2) 
it follows that the 5% critical region here is determined by the in¬ 
equality u < —1.64. This value merely determines the 5% left tail 
of the standard normal curve. It may be recalled that this is precisely 
the critical region that was selected on intuitive grounds in the first 
problem on applications of Theorem I, Chapter IV. The procedure 
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followed in the solution of that problem was therefore a highly efficient 
one from this point of view. 

From Fig. 1 it appears that the test corresponding to curve c is a 
fairly efficient test provided that all values of m ^ m 0 are assumed as 
possible alternatives to the hypothesis of m = m 0 . Unless further 
restrictions are placed on the nature of the tests that should be con¬ 



sidered, there does not appear to be any test more efficient than this 
symmetrical test for this case. These considerations help to justify 
the intuitive procedure in problems of the type indicated in (1). 

It can be shown with considerably more difficulty that Student/s 
t test has efficiency properties analogous to those of the one-sided and 
symmetrical normal curve tests for testing the hypothesis m = m 0 
when the value of a is unspecified. Such investigations justify the 
use of the t test in previous applications. 


MAXIMUM LIKELIHOOD 

The illustration of the preceding section gives an indication of the 
difficulties that are met in designing efficient tests. Although the 
hypothesis in that illustration was very simple, no test existed that 
minimized the type II error for all possible alternatives. For more 
complex hypotheses, the occurrence of tests with this minimizing 
property is infrequent. It is often possible in such situations to place 
further restrictions on the class of tests that will be considered and 
thereby determine a best test from among this restricted set; however, 
such ideas will not be discussed here. 

A second basis for selecting efficient tests is one called the principle 
of maximum likelihood. This principle is justified partly on its strong 
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intuitive appeal but mostly on the desirable properties the tests it 
produces are found to possess. 


1. Maximum Likelihood Estimation 

Suppose that the variable x has a distribution function/(x; 0) which 
depends upon the single parameter 0. The distribution function of a 
random sample of size n will therefore be given by 

(9) P =f(x 1 ; 0)/(x 2 ; e) • • */(x n ; 0) 

For a given sample, P represents the probability density at the sample 
point xt, x 2 , •••, x n , or the probability of obtaining the sample, depend¬ 
ing upon whether x is a continuous or a discrete variable. In either 
case, P represents a function of 0 which is called the likelihood function 
for the sample. This name corresponds to one’s intuitive belief that 
an estimate of 0 that makes P relatively large is likely to be a good 
estimate of the parameter. If this belief is capitalized upon, it gives 
rise to a technique of estimating population parameters that is known 
as the method of maximum likelihood. The technique is often quite 
simple since it is merely necessary to solve the equation dP/dd = 0 
for 0. 

As an illustration of the maximum-likelihood technique, consider the 
problem of estimating the mean m in the Poisson function 


fix) m) 



Here, the likelihood function given by (9) reduces to 


P « 


3 * 1 ! 






Therefore, 


Xi Ix 2 ! * * -x n \ 


?n-l 


„xn 


dm 


xi\x 2 \' • *x n ! 


— mn 

e * nm 


x\\x 2 \• • •£„! 



The solution of dP/dm = 0 is evidently m = x. This result shows that 
the probability of obtaining a given set of sample values from a Poisson 
population is a maximum when the population mean is equal to the 
sample mean. 
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The preceding technique can be extended directly to distribution 
functions of more than one parameter and more than one variable. 


2. Simple Hypotheses 


The principle of maximum likelihood can also be applied to the 
problem of selecting efficient tests for testing statistical hypotheses. 
For simplicity of explanation, suppose as before that the distribution 
function of the variable x depends upon a single parameter 0 and that 
the hypothesis to be tested is that 6 — 0$. If the sample values of 
x are held fixed, P in (9) is a function of 0 only. Let P w (0) denote the 
maximum value of this function, and let P(0 O ) denote its value for 
6 = 6 0 . Then the ratio 


( 10 ) 


Pie o) 

Pm{6) 


is a function of the sample values x\, • * •, x n only. Since P(6) includes 
P(0 O ) as one of its possible values, its maximum value must be at least 
as great as the value of P(0 O ); consequently X satisfies the inequality 
0 < X < 1. This ratio is called the likelihood ratio for the hypothesis 
being tested. If X is close to 1, the probability density of the sample 
point could not be increased much by allowing 0 to assume values other 
than 0 O ; consequently a value of X near 1 corresponds intuitively to 
considerable belief in the truth of the hypothesis that 0 = 0q. If increas¬ 
ing values of X are treated as corresponding to increasing degrees of 
belief in the truth of the hypothesis, and if the distribution function 
of X can be found, then a critical value of X, say X 0 , could be determined 
such that the hypothesis would be rejected if and only if P[X < X 0 ] 
- 0.05. 

Although this method of selecting a test is based largely on intuitive 
arguments as contrasted with the preceding method, it is highly useful 
for those hypotheses for which a minimum type II error does not exist 
and for complicated hypotheses. Experience and theory indicate that 
likelihood tests possess many desirable properties. 

As an illustration of the application of the likelihood principle, con¬ 
sider the same hypothesis as for the illustrative example of the preced¬ 
ing section. Since or = 1 in that example, the only parameter is 0 = m\ 
consequently 



(ID 



COMPOSITE HYPOTHESES 


209 


The hypothesis to be tested is that m = ra 0 ; therefore 


P{m o) 




1 - h (xi - * 


By differentiating (11), the maximum likelihood value of m will be 
found to be m = x; consequently 






If these two values are substituted in (10), it will reduce to 


" n n 

— ^2 ^ —m 0 ) 2 — y~\ z t —5:) 2 

= C L 1* JL 


__ M[ — 2nx-f-nmo 2 + nZ 2 ] 


Since n and ra 0 are known constants, this equation expresses a relation¬ 
ship between X and By means of this relationship it would be 
possible to find the distribution function of X from that of x; however, 
for the purpose of testing hypotheses it is merely necessary to know 
how to find the critical region for the distribution function of X from 
that for x . Now from (12) it is clear that to each value of X there 
correspond two values of x and that these values of x are symmetrical 
with respect to x = m 0 . There will therefore be two values of x 
corresponding to the critical value of X = X 0 . Furthermore, increas¬ 
ingly small values of X correspond to increasingly large values of 
| x — mo |. Therefore the 5% critical region for X consisting of the 
interval 0 < X < X 0 will correspond to the two 2J^% tails of the x 
normal distribution. The critical region for x therefore consists of the 
two intervals given by | x — mo | y/n > 1.96. A comparison of this 
result with (1) shows that the likelihood ratio test is merely the com¬ 
monly employed test for this hypothesis. 


3. Composite Hypotheses 

When a distribution function of one variable depends upon more 
than one parameter and not all of them are specified by the hypothesis, 
the hypothesis is called composite and the likelihood ratio is defined 
more generally in the following manner. 

Let P(x i, ••' y x n ; 6i, •••,0*) denote the probability density (or 
probability) function given in (9) when this function depends upon k 



210 


TESTING STATISTICAL HYPOTHESES 


parameters, and let P(x i, • • •, x n ; $x, • • •, 6k) represent this function 
when those values of the parameters that are specified by the hypothesis 
have been inserted. Then the likelihood ratio is defined as 

/1oN x Pm(?t 1 j * * * f %n t 6 1 , ’ y 6k ) 

(13) A = ” ‘ “ 

Pm\% 1> X ny 6 1 , *, dfc) 

where, as before, the subscript m on P denotes its maximum value with 
respect to the parameters involved. It is clear that (10) is a special 
ease of (13) when k = 1. The same intuitive arguments employed to 
justify the use of X in (10) as a basis for testing a simple hypothesis 
may be employed to justify the use of X in (13) for testing a composite 
hypothesis. t 

As an illustration of how (13) may be used to design tests of more 
complicated hypotheses, consider the problem of deciding whether a set 
of variances is homogeneous. In measuring the variability of indus¬ 
trial processes, for example, it is necessary to know whether the process 
variability has changed; consequently a test of homogeneity is required. 

Consider k normal populations with respective means and variances 
given by rrii and <r t 2 (i = 1, * • •, k). Let random samples of sizes 
be drawn from these populations. Then the hypothesis to be tested 
here is that 


(14) 


cq =02 = * • • = (Tk 


For simplicity of notation, the probability density will be denoted by 

k 

P(x % y ; rriij a*), where x tJ represents the X %2 l n i = n variables, and m t and 

l 

cri represent the 2 k parameters. Here 




(15) 




t- 1 1 




When the hypothesis (14) is true, (15) reduces to 

y 


(16) 


m„ a) = 


e 


(2tt)V 


where <r represents the common value of the <j 1 . In order to apply (13), 
it is necessary to maximize (15) and (16) with respect to their parame^ 
ters. This is accomplished by first taking logarithms of both sides- 
If (15) and (16) are denoted by P and P f , respectively, then 
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d log P 


htr* 


dm i j- i 

d log P 1 

d&i 

d log P f 

dm,i 

d log P f 
da 


Ttli) 


\r m l ^ 

— -- h —= / .(xjj - m ,) 2 

°i <n t=i 

1 y\ 

= + ~g Xlfai ~ ™.) 

- A: n» 


j~l 


From the first and third of these derivatives, it follows that the maxi¬ 
mum likelihood estimates for m t are in each case given by = xi> 
From the second and fourth of these derivatives, it follows that the 
respective maximum likelihood estimates for cr* are given by 


( x ij ~ X t ) 2 


and 


7=1 n 


= Si 


k ti\ t - \2 k 9 

__ y' yy (x % j — x l ) _ yv niS t z 

* ~~ 2-J ^ ~ n. 


i= 1 1 


j-1 


If these estimates are substituted in (15) and (16), respectively, P 
and P' will become 


P m = 


and 


(2r) V 1 - • •«»** 


p> _ 

* m — 


(2*)* 


ttis x 2 H-1- n fc sjfe 2 l 2 


n 


The likelihood ratio given by (13) therefore reduces to 
(17) X = - 


si"' 


nisi 2 -1-b n k s 2 ~\ 2 


n 
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If both sides of this expression are raised to the power 2/n, it will 
2 

be observed that \ n is the ratio of the geometric and arithmetic means 
of the sample variances. 

If now the distribution function of X were available, it would be 
possible to find critical values of X for deciding whether to accept the 
hypothesis of equal variances. Because of the complexity of this 
distribution function, it is necessary to resort to convenient approxima¬ 
tions. If the values of the n x are fairly large, it turns out that the 
quantity —2 log* X has a distribution that can be approximated fairly 
well by the x 2 distribution with k — 1 degrees of freedom; consequently 
critical values of X may be obtained from the corresponding critical 
values of x 2 - 

A somewhat more accurate test, particularly for small values of the 
n x , than the likelihood test just discussed is available. In this test 
each rii in X is interpreted as the number of degrees of freedom in s 2 , 
and s 2 is interpreted as the unbiased estimate of a t 2 . If the resulting 
value of X is denoted by this test consists in treating 


(18) 


-2 log,. n 


1 + 


— It 1 - 1 

— 1) I n 


3 (k - 1) 


as a variable with a x 2 distribution with k — 1 degrees of freedom. 
These changes in the interpretation of n x and s 2 will not affect the 
value of X appreciably if the n x are large; however, for small samples 
this correction becomes important. For the special case in which all 
the rii are equal, it can easily be shown that 


(19) 


loge /X =-l0g c X 

n L 


where n x denotes the number of degrees of freedom corresponding to 
Ui. Thus, this correction is much like that used to eliminate the bias 
in large-sample estimates of variances. This improved version of the 
likelihood ratio test was designed when investigations showed that the 
likelihood ratio test was slightly biased. 

Maximum likelihood estimates and tests are known to possess desir¬ 
able properties for large samples; however, many of them turn out 
to be biased for small samples and need to be corrected accordingly if 
they are to be used on small samples. 

As a numerical illustration of this test, consider once more the second 
problem on applications of the x 2 distribution in Chapter VIII. In 



EXERCISES 


213 


that problem, five sample variances were combined to yield a single 
estimate of a 2 on the assumption that the variances were homogeneous. 
Here 

ni«i 2 = 1,185, n 4 s 4 2 = 1,478 

n 2 s 2 2 = 1,599, n 5 $ 5 2 = 705 

5 

n 3 s 3 2 = 4,214, J^n t s 2 - 9,181 

1 

Since each variance was based on five measurements, n x = 5 if X is 
used and ft; = 4 if m is used. The value of n t s 2 is the same for X and 
fi. Calculations give — 2 log c X = 4.60; therefore by (19) and (18), 
— 2 loge m = 3.68 and 


-2 log, n _3.68 _ 

i f i 11 i-i 

3 (k — 1) i n x 7i I 


Since xo 2 “ 9.5 for k — 1 = 4 degrees of freedom, this result is not 
significant. The assumption of homogeneity appears to have been a 
reasonable one. 
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EXERCISES 

1. Find the maximum likelihood estimate of p for a binomial distribution based 
on a sample of size n. 
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2 . Find the maximum likelihood estimates of m and a for a normal distribution. 

3. Calculate the type II error if the hypothesis a » 10 is tested against the 
single alternative a = 20 on the basis of a sample of size 4 from a normal popula¬ 
tion and if the critical region consists of the 5% right tail of the distribution of s 2 . 

4 . Prove that the likelihood ratio test of the hypothesis m ** mo for a normal 
population of unknown variance a 2 is equivalent to Student’s i lest for this hypothe¬ 
sis. 



CHAPTER XII 


STATISTICAL DESIGN IN EXPERIMENTS 

It is a common occurrence for experimenters who are unacquainted 
with statistical principles to seek statistical assistance when their 
experiments fail to produce the results anticipated by them. In some 
experiments the data were obtained in such a manner as to exclude any 
valid conclusions of the type desired; in others, there is little that can 
be done to extract further information from the data because the experi¬ 
ment was not designed with a statistical analysis in mind. Only rarely 
are the experiments that give valid conclusions as efficient as they 
would have been if a standard statistical design had been employed. 
Too many experimenters do not seem to appreciate the obvious injunc¬ 
tion that the time to design an experiment is before the experiment is 
begun. 

In this chapter, the statistical design of experiments will be con¬ 
sidered from the point of view of validity and efficiency. Although 
an experimental design that does not yield valid results may be con¬ 
sidered inefficient, it is convenient to distinguish between these two 
concepts because a vjilid design need not be an efficient one. Only a 
few of the many techniques available in statistical literature for assist¬ 
ing in the designing of experiments will be considered in this chapter. 


VALIDITY 

In most experiments there are several variables in addition to the 
one or more being investigated that need to be controlled if the experi¬ 
ment is to give valid conclusions. In some cases these interfering 
variables can be controlled by laboratory techniques; in others such 
control may be possible only through statistical design. WCs a simple 
illustration, consider an agricultural experiment in which two different 
seed varieties arc to be tested on a piece of land. If the piece of land 
were divided into two equal pieces and one variety planted on each, 
the difference in yields could not be used as a valid estimate of the 
differential effect of the two seed varieties because of the possible differ¬ 
ence in soil fertility of the two pieces. 
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Experiments can often be made valid by applying the principles of 
randomization and replication . Thus, in the present illustration, if the 
piece of land were divided into a number of small plots of equal size, 
and if one variety of seed were planted on half of those plots and the 
other variety on the remaining half, with the selection of plots deter¬ 
mined by a random process, then the varying fertility of the land would 
affect the two varieties approximately equally and therefore the differ¬ 
ence in varietal yields would represent a valid estimate of the differential 
effects of the two seed varieties. 

Randomization by itself is not necessarily sufficient to yield a valid 
experiment. For example, if one merely tossed a coin to determine 
which half of the original piece of land should be planted with one 
of the seed varieties, the selection would be random but it would not 
permit the two seed varieties to be equally affected by any varying 
fertility. In order to insure validity, it would be necessary that the 
piece of land be divided into a sufficiently large number of similar 
plots so that the probability will be very small of having one of the 
seed varieties largely located on the more fertile plots. This repetition 
of an experiment or experimental unit is called replication. Thus, to 
insure validity in an experiment, randomization should be accompanied 
by sufficient replication. 

Not only are randomization and replication useful techniques for 
assisting in the construction of valid experiments, but they are often 
essential to certain classes of experiments whose conclusions depend 
upon the use of distribution functions. Since all the distribution func¬ 
tions in this book were derived upon the basis of random sampling, 
it follows that the methods employed in the preceding chapters are 
applicable to such samples only; consequently any experiment whose 
conclusions depend upon such methods requires randomization. Repli¬ 
cation is also necessary for the application of any method that obtains 
its measure of variability directly from the data because at least two 
observations are needed to measure variability. For example, the 
illustrative experiment just discussed requires randomization and 
replication if the difference between mean yields is to be tested by 
means of Student's t distribution, because the t distribution is based on 
random sampling and because sample variances are needed to evaluate t. 

The requirement of random samples for the applicability of most 
statistical methods is not always easy to satisfy. For example, if the 
product of a machine is sampled every hour for several days, it may 
easily happen that the product of the machine changes during the day 
because of the operator's working pattern and also from day to day 
because of wear. For situations like this in which observations are 
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ordered with respect to time, one of the previous methods for testing 
randomness should be applied before methods based upon random 
samples are used. 


EFFICIENCY 

In the preceding illustration, the techniques of randomization and 
replication removed much of the danger of obtaining biased results; 
however, these techniques did not remove the effect of differences in 
soil fertility on the variability of yields. If the variation in fertility 
is increased, the variation in yield is thereby increased. As a conse¬ 
quence, if Student’s t distribution for testing the difference between 
two means were applied, a considerably larger sample might be needed 
to produce a significant difference with large fertility differences between 
plots than if the plots were of uniform fertility because of the larger 
estimate of variance involved in the denominator of t . Such an experi¬ 
ment could therefore be made more efficient by selecting plots of uni¬ 
form fertility. Very often, however, it is not feasible to control the 
fertility in this manner. Nevertheless, by arranging the plots into 
small homogeneous groups, it is often possible to eliminate statistically 
the greater share of the fertility variability effects in the t test and 
thereby make the experiment more efficient. This approach to effi¬ 
ciency will be treated in the next section. 

No attempt will be made here to state what is meant by an efficient 
experiment; however, certain common aspects of efficiency will be 
treated. For experiments of fixed size, methods for increasing the 
sensitivity of an experiment will be considered, whereas, for experiments 
of variable size, methods for minimizing the amount of sampling 
needed to insure the desired sensitivity will be considered. 


ANALYSIS OF VARIANCE 

One of the most useful techniques for increasing the sensitivity of an 
experiment is the designing of the experiment in such a way that the 
total variation of the variable being studied can be separated into 
components that are of experimental interest. This technique, which 
is called the analysis of variance, was introduced in Chapter VIII as 
an application of the F test. It enables the experimenter to utilize 
statistical methods to eliminate the effects of certain interfering 
variables. 

An example of such an experiment occurred in the third application 
of the F distribution in Chapter VIII. In that experiment 4 equal 
plots of land were each divided into 5 equal subplots. Then the £ 
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different treatments being tested were assigned to the 5 subplots in 
each of the 4 plots by a random process. The advantage of assigning 
treatments at random within a plot rather than assigning them at 
random throughout all 20 subplots lies in the fact that the fertility 
of the soil is likely to be more homogeneous in small neighboring 
groups of 5 subplots than it is throughout all 20 subplots and that it 
may therefore be possible to measure and hence eliminate some of the 
interfering soil variability by this procedure. For the purpose of 
observing the advantage of this technique, consider the difference in 
approach in testing for differences among treatments before and after 
the 4 plots are segregated. Before segregation the variation in yield 
was broken down according to (21), Chapter VIII, into the variation 
within treatments and between treatments. After segregation the 
variation in yield was broken down according to (2G), Chapter VIII, 
into the variation between treatments, between plots, and the re¬ 
mainder. The F distribution was applied in each case to test the 
hypothesis of no treatment differences. The computations there gave 
the values 


F = 22, 

II 

v 2 — 15 

F = 36, 

II 

aT 

V'2 = 12 


respectively. Since the change in v 2 has only a slight effect upon the 
critical value of F, it is clear that the elimination of plot differences in 
the second F test enabled the treatment differences to be recognized 
more easily and thus produced a more sensitive experiment. 

If there had been other variables in addition to soil fertility that 
were believed to influence yield and that could be controlled statistically 
in much the same manner as fertility was, then a further reduction in 
the variance could be made with a corresponding increase in the sensi¬ 
tivity of the experiment for detecting treatment differences. For 
example, if similar experiments were conducted at different experi¬ 
mental farms or regions, a loss in experimental sensitivity would result 
if the variability arising from farm or region differences were not 
eliminated by the proper analysis of variance. 


TWO TYPES OF ERROR 

If the conclusions to be obtained from an experiment depend upon 
the results of a test of a statistical hypothesis, the best test available 
for testing this hypothesis should be used to increase the sensitivity 
of the experiment. From the preceding chapter, such a test, if it 
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exists, is one that minimizes the type II error. Although only a few 
of the tests of hypotheses presented in this book actually minimize the 
type II error, most of them are considered highly efficient tests for 
testing the hypothesis in question. 

In addition to its use to increase the sensitivity of an experiment, 
the type II error can also assist the experimenter in deciding how large 
his experiment should be. Before consideration can be given to the 
size of an experiment, it is necessary to determine rather carefully 
wliat the experiment is expected to accomplish. Frequently the experi¬ 
ment is expected to decide which of two or more procedures or qualities 
is preferable. The experimenter would like to be fairly certain that the 
experiment will indicate a difference if and only if a real difference is 
present. This assurance can be obtained by making the probabilities 
of the two types of error arising in the significance test to be used 
sufficiently small. If the probabilities of the two types of error are 
denoted by a and /?, respectively, the experiment should be designed to 
be sufficiently large to insure that the test will yield values of a and ft 
that will satisfy the experimenter. 

As a simple illustration of how to determine the size of an experi¬ 
ment by means of the two types of error, consider a variation of the 
problem proposed as an application of Theorem I, Chapter IV. There, 
experience gave a mean of 15.6 pounds and a standard deviation of 
2.2 pounds for the breaking strength of samples of a certain brand of 
string. Then a time-saving process was tried which seemed to lower 
the mean somewhat. Suppose, now, that the manufacturer will tolerate 
a drop in the mean to 14.G pounds but no lower. How large a sample 
will be necessary if the manufacturer desires the probability of a type I 
error to be 0.01 and the probability of a type II error to be 0.05? 
The critical region here will correspond to the 1% left tail of the normal 
curve for x and is determined by the inequality 


X < Tflo — To 02 cr x 


where r 0 .02 is the standard normal deviate. For the problem under 
consideration, this inequality becomes 


0 ) 


2 2 

x < 15.6 - 2.33 

V n 


Now the probability of a type II error is the probability that x will 
not fall in this critical region when the hypothesis is false. It is often 
more convenient to treat it as 1 minus the probability that x will fall 
in the critical region when the hypothesis is false. Here it will be 
assumed that the hypothesis to be tested is m = m 0 = 15.6 and that 
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the only alternative is m = mi *= 14.6. It is clear that, if m < 14.6, 
the type II error would be decreased. In order to make the type II 
error equal 0.05, it is therefore necessary that the probability be 0.95 
that x will satisfy (1) when the population mean has dropped to 14.6. 
Since x is now normally distributed with mean 14.6 and standard 
deviation 2.2/\/n> this requirement may be written in the form 


\/n 


2.2V / 2r 


f 

i7T «/ — 


15.6 — 2.33 


— n/ y-14 Q \ 
Vn 2\ 2.2 ) 


? —14 ON 2 


dx « 0.95 


Let y = \/n(x — 14.6)/2.2; then this equation reduces to 


i_ c#- 

V 2 ir J-. 


-2.33 J* 

e 2 ~ dy = 0.95 


From Table II it follows that n must satisfy the equation 

\/n 


2.33 - 1.64 


The solution of this equation is n = 76; consequently a sample of this 
size will give the manufacturer the specified protection against an 
incorrect decision. 

Unless the difference between the hypothetical value of a population 
parameter and its alternative value is rather large, the experimenter will 
discover that a considerably larger sample is required than he had 
anticipated. The size of the experiment can be decreased, of course, 
if the probabilities of the two types of error are increased. 

If the experiment can be placed on the basis of a day-to-day ac¬ 
cumulation of data, there are methods which require on the average 
smaller samples than those indicated in the procedure just discussed; 
however, if the experiment is such that it is not feasible or convenient 
to design it on other than a fixed-size basis, the preceding procedure 
yields the desired information. The accumulation-of-data method 
referred to will be considered briefly in a later section. 


SAMPLING INSPECTION 

The discussion thus far has been concerned with techniques for 
designing valid experiments and for increasing the sensitivity of such 
experiments. Although there are many other such techniques, they 
will not be considered here. Consideration will now be given to the 
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second feature of efficiency that was introduced in the section on 
efficiency, namely, minimizing the amount of sampling. 

One of the most useful applications of the design of experiments to 
minimize the amount of sampling occurs in industrial sampling inspec¬ 
tion. If a certain type of sampling procedure is agreed upon, the notion 
of the two types of error can be used to advantage to design an efficient 
inspection procedure. 

It is a common practice in industry to accept or reject lots of mer¬ 
chandise on the basis of a sample drawn from the lot. This practice 
arises from the fact that it is often more economical to tolerate a 
small percentage of defectives than to bear the cost of 100% inspection. 
The basis for accepting a lot of merchandise usually consists in specify¬ 
ing the maximum number of defective pieces that will be tolerated in 
a random sample of a given size. By means of such samples and 
specifications the purchaser is protected against receiving bad lots of 
merchandise. 

Sampling inspection is quite different from quality control. It is a 
method for protecting the purchaser against poor quality after the 
product has been manufactured rather than a method for finding and 
correcting flaws in the manufacturing process, as in quality control 
methods. When sampling inspection methods are applied to continuous 
manufacturing processes, however, they are often useful in helping to 
control the quality of the product. 

From the consumer’s point of view, there is a maximum percentage 
of defectives that he will tolerate. This percentage when expressed as a 
decimal is known as the lot tolerance fraction defective and is denoted 
by p t . Without nearly 100% inspection, it may be impossible to be 
certain that the quality is better than p t ; however, it is possible to set 
up a sampling procedure that will insure this quality with a certain 
probability. To this end consider a lot of N pieces from which a 
random sample of n pieces is selected. Let c denote the maximum 
number of defective pieces in the sample for accepting the lot. 

Although numerous sampling schemes are available, only one com¬ 
mon type of sampling procedure, known as single sampling , will be 
considered here. This scheme proceeds as follows: 

1. Inspect a sample of n pieces. 

2. If the number of defective pieces does not exceed c, accept 
the lot; otherwise inspect the entire lot. 

3. Replace all defective pieces found by non-defective pieces. 

Now consider the probability that the consumer mil receive a bad 
lot under this sampling procedure. If the lot being considered is one 
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of precisely p t fraction defective, there will be Np t defective and 
N — Np t non-defective pieces in the lot. Then the probability of 
obtaining x defectives in a sample of size n is given by the ratio of the 
purnber of ways of obtaining x things from Np t things and n — x 
things from N — Np t things to the number of ways of obtaining n 
things from N things. By means of the familiar college algebra com¬ 
bination formula 

(•) - -!L_ 

\ r ) r!(6* — r)! 


which gives the number of ways of obtaining r things from s things, 
this probability may be expressed as 


(3) 



The probability that the consumer will be led to accept a lot of quality 
p t will therefore be 


(4) 


Pc 


C 



Np\ /N - N Vt \ 
x / \ n — x / 



This probability is known as the consumer’s risk. By demanding a 
small value of P c , the consumer is adequately protected against poor 
quality. The consumer’s risk would be still smaller if the fraction 
defective were below the consumer’s tolerance value p t . 

From the producer’s point of view, any sampling scheme for deciding 
on the quality of a lot possesses the disadvantage of occasionally 
rejecting a lot of satisfactory quality. If the producer has standardized 
his quality at a level denoted by p, which is called the process average 
fraction defective 7 then from (4) the probability that a lot of his will 
be unjustly rejected is 



This probability is known as the producer’s risk . It is clear that P p 
can be made small by making p sufficiently small; however, it may 
often be more economical for the producer to admit a fairly large risk 
than to attempt to decrease p. 
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It will be observed that the consumer and producer risks correspond 
to the two types of error in testing hypotheses. For example, if ft 
is the hypothetical value of p, and p t is the alternative value, then P p 
represents the type I error and P c represents the type II error. As a 
matter of fact, consumer and producer risks preceded the use of the 
two types of error in statistical literature. 

1. Minimum Single Sampling 

Thus far nothing has been said concerning the method of selecting 
values of n and c. The consumer’s requirements fix the values of p t 
and P c , in (4). Since N is specified, (4) places a single restriction on 
n and c. Now, from the producer’s point of view, one desirable method 
of approach is to select that pair of values which minimizes the amount 
of inspection. Since a sample of size n is always inspected and the 
remainder of the lot is inspected with a relative frequency given by (5), 
the average number of pieces inspected per lot under the sampling 
scheme (2) will be given by 

((i) I = n + (N — ri)P p 

In order to satisfy the consumer’s demands and also minimize the 
amount of inspection, it is necessary to find that pair of values of 
n and c which satisfies (1) and minimizes (6). These quantities are 
difficult to manipulate; consequently the minimizing solution is 
obtained numerically for different values of N , p t , ft, and for P c chosen 
equal to 0.10. Extensive tables are available for the minimizing values 
of n and c under these conditions. 

As an illustration, consider a lot of 1,000 pieces for which the process 
average is ft = 0.01 and for which the consumer is willing to assume a 
risk of P c = 0.10 of accepting a lot with a fraction defective of p t = 0.05. 
Upon consulting the proper tables, or working numerically by allowing 
c to assume small integral values, it will be found that the minimum 
amount of inspection will occur if a sample of 130 is taken and if the 
maximum allowable number of defectives is 3. With these values it 
will also be found that the average number of pieces inspected per lot 
will be 1G4. 

2. Average Outgoing Quality Limit 

A somewhat different approach to the problem of protecting the 
consumer from an inferior product is to attempt to guarantee him a 
certain quality level of the product after inspection regardless of what 
quality level is being maintained by the producer. Toward this end, 
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consider the problem of determining the mean value of the fraction 
defective after inspection if the producer’s fraction defective is p. 

From (2) it is clear that there will be no defectives left in a lot of 
N if the sample gives a number of defectives, x, greater than c because 
then the entire lot will be inspected. It also follows from (2) that the 
number of defectives left in a lot of N after inspection when x ^ c 
will be Np — x because now only the x defectives of the sample will 
be replaced by non-defectives. From (3) the probability of obtaining 
x defectives is 


Pix) - 


/Np\ (N - Np\ 
\ x / \ n — x / 



Since the mean value of a discrete variable x that takes on the values 
Xi,- • *, Xk is given by 

h 

m = 2>(x.) 

T^i 

the mean value of the number of defectives after inspection wdll be 
given by 

c N 

m = » p — x)P(x ) + -P(x) 

x z c fi 

© 


If this expression is divided by N, it will give the mean fraction defective 
in lots of N when following the inspection procedure (2). If this mean 
value is denoted by p, it follows that 


(7) 


c 



t-lieXV-?) 



If the sampling procedure (2) has been specified, the values of N, n, 
and c may be treated as given. The consumer, however, is not likely 
to be willing to accept the producer’s claim that his fraction defective is 
p; consequently p may not be treated as given. If p is considered a 
function of p , it will be found that p possesses a maximum value. This 
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maximum value, which will be denoted by pl t is called the average 
outgoing quality limit. It is a number such that, regardless of what the 
producer's fraction defective may be, the average fraction defective 
after inspection never exceeds pL. It might appear offhand that the 
value of p would continue to increase with p ; however, as p increases a 
greater percentage of lots will be sampled 100%, with a resulting 
eventual decrease in the average percentage of defectives remaining. 

The average outgoing quality limit has a certain appeal to many 
consumers that is not possessed by the protection afforded through a 
specified consumer's risk. 

It is usually possible to select several pairs of values of c and n 
that will yield functions, p, having approximately the same value of 
Pl> From the producer's point of view, it would be highly desirable 
to select that pair of values which minimizes the amount of inspection 
given by (6). As in the minimum single sampling of the preceding 
section, the minimizing pair of values of c and n is obtained numerically. 
Tables are available for determining these minimizing values corre¬ 
sponding to useful ranges of values of N , Pl , and p. It should be noted 
that the value of p is required in order to minimize 7, just as it was in 
minimum single sampling. 

As an illustration, consider the problem that was used as an illustra¬ 
tion for minimum single sampling. Then N = 1,000, p t = 0.05, p 
= 0.01, and P c = 0.10. The Dodge and Romig tables referred to at 
the end of this chapter show that pL = 0.013 for this problem. If the 
consumer wishes an average outgoing quality limit of, say, pL = 0.03, 
these tables give c = 2 and n = 44 as the values that will minimize the 
amount of inspection. 


STRATIFIED SAMPLING 

The technique of breaking down the variation of a variable into useful 
components in order to decrease the experimental variation, as was 
done in the analysis of variance, can also be used to advantage in 
designing experiments for estimating means of populations. It turns 
out that a more accurate estimate of the mean can often be obtained 
by taking restricted random samples than by taking completely 
random samples. For example, suppose that an accurate estimate of 
the mean weight of fifth-grade pupils was desired for a school system. 
By taking the proper-size random samples in the various age groups, 
or in the various schools of the system, a more accurate estimate of 
the population mean will usually be obtained than by taking the same 
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total sample at random in the system. In order to determine the 
proper size subsamples, consider the following general problem. 

Let a population be divided into k distinct subpopulations. Further, 
let the mean and variance of this population be m and <j 2 and of the ith 
subpopulation be m t and c j 2 . Then consider as estimates of m the 
quantities x and x^, where x is the mean of a random sample of size 
n and where 

7ii njc 

(8) xr — — xi + • • • H- Xk 

n n 


in which x x is the mean of a random sample of size n t drawn from the ith 

k 

subpopulabion and = n. This restricted type of random sam- 

1 

pling is called stratified sampling. 

For the purpose of comparing the relative precision of these two 
estimates of m, consider their respective variances. The variance of x 
is given by crj 2 = a 2 /n. Since the x t are independent, the variance of 
(8) is given by 


(9) 


k / \ «> k / \ o o ^ 

■•-EfeK’-Efe); -E^ 

*\ IV / “\ / lb i 


In order to express the variance of x in terms of the a t 2 f it is neces¬ 
sary to express the distribution function of the population in terms 
of those of the subpopulations. This may be done by applying the 
two basic rules of probability to the problem of determining the proba¬ 
bility that x will assume a value within any specified interval. If jo* 
denotes the probability that x will come from the fth subpopulation 
and/*(x) denotes the distribution function for this subpopulation, 



dx 


represents the probability that x will come from the ith subpopulation 
and will assume a value between a and 0. Since these subpopulations 
are mutually exclusive, the probability that x will assume a value 
between a and /3 is the sum of all such probabilities; hence 



But a and f3 are arbitrary; consequently by the same reasoning as was 
followed on (5), Chapter VI, 

f(x) = Piflix) H- 1 -Pkfkix) 
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= Pi mi H-f- p k m k 

Furthermore 



= Pi ki 2 + mi 2 ] + • * • + PkWk 2 + m* 2 ] 


If the value of m 2 is eliminated by means of (10) and the fact that 

k 

= 1, this reduces to 
l 

(T 2 = y^pjg -, 2 + (to, - to) 2 ] 

1 

From this result it follows that the variance of x can be written in the 
form 

1 k 

<ii'> a? = - YW + (m, — m) 2 ] 

n 

Now consider a special type of sampling called representative sampling 
in which the subpopulation sample sizes, n ly are chosen so that 
= p t . For a finite population this means that the relative sizes of the 
subpopulation samples are chosen equal to the relative sizes of the 
subpopulations. For representative sampling, (11) may be reduced 
by means of (9) to the form 

k 

( 12 ) a£ = 4 -^ ~ ( to ,- - to ) 2 

This shows that <t 5 2 > cr^ 2 , unless the subpopulations have equal 
means. Representative sampling is of particular advantage for popula¬ 
tions whose subpopulations have widely differing means. 

Public-opinion polls are familiar examples of representative sampling. 
For such polls it is customary to stratify the population in several 
ways. For example, it may be divided into several income groups, into 
several vocational groups, etc. Then, within strata, random samples 
are taken proportional to the relative sizes of those strata. 
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Various other types of restricted random sampling are available, 
most of which have been developed by governmental agencies for their 
particular needs. 

As an illustration of the increased precision of estimating m through 
the use of representative sampling, suppose for the sake of simplicity 
that a district is made up of 45% democrats and 55% republicans, and 
that 70% of the democrats will vote for a certain “non-partisan” 
candidate in a primary election but only 20% of the republicans will 
do so. Now suppose that a sample of size 200 is taken by each method. 
Although experience indicates that the precision of poll percentages is 
not as great as that given by binomial theory, the precisions here will 
be compared on a theoretical basis; consequently 


and 



(0.425) (0.575) 
200 


0.00122 



(255? (0 70 “ 0A25)2 + ds? (0 20 " 0J25f 

0.00031 


Therefore from (12) 

<r,/ = 0.00091 

Since ct 2r 2 /(t z 2 = 0.75 here, a considerable increase in precision would 
result from using representative sampling in preference to pure random 
sampling. 

SEQUENTIAL ANALYSIS 

The methods that have been presented thus far for minimizing the 
amount of sampling needed to attain certain objectives were designed 
on the assumption that the experiment was to be of fixed size, once 
the minimizing size had been determined. If the experiment can be 
conducted on an accumulation-of-information basis, there are methods 
that require considerably less sampling than even the best of the fixed- 
size methods. These methods are known as sequential methods because 
they operate upon the successive terms of the sequence of observations 
as they are received. These methods have been found to require only 
about 50% as much sampling on the average as the best fixed methods 
for some problems. 

\/Sequential methods were designed to test hypotheses. In a sequential 
test, a rule of procedure is given for making one of the following three 
decisions at each stage of the experiment: (1) accept the hypothesis, 
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(2) reject the hypothesis, (3) continue the experiment by taking an 
additional observation. 

For the purpose of describing a sequential test, consider a variable x 
whose probability function/Or; 0 ) depends upon the single parameter 0 . 
If # is a continuous variable, fix ; 0 ) represents the probability density 
at the point x\ if x is discrete, it represents the probability that the 
variable will assume the value x . Let the hypothesis to be tested be 
denoted by 0 = 0 O , and let there be but the single alternative 0 = 0 
Then form the likelihood ratio 


(13) 


Pim f(xi; ei)f(x 2 ; $i) ■ • -/(Zwifli) 
Porn fix 1 ; 6»o)/0 2 ; e 0 ) ■ ■ -f(x m ; 0 O ) 


where x if * * *, x m represent m random-sample values of x. Finally, 
let a and represent the probabilities of making a type I and type II 
error, respectively. Then the sequential test known as the sequential 
probability ratio test proceeds as follows: 


(14) 


Pi ft 

1. If - < - , accept the hypothesis that 6 = 6 0 . 

Pom I — a 

p 1 _ ^ 

2. If-> -, accept the alternative that 0 = 0 X . 

POm OC 

(3 pi m 1 — 

3 if-<-<-, take an additional observation. 

1 ~ Oi Pom « 


This procedure is continued until either 1 or 2 is satisfied. 

Because certain approximations were used to obtain these inequali¬ 
ties, it is not strictly true that the two types of error will be maintained 
at the levels given by a and /3; however, since these approximations 
have been found to be excellent for ordinary applications, this test 
may be used with confidence. 

For most applications there will be more than one alternative to the 
hypothesis; nevertheless the problem can often be solved satisfactorily 
by considering only one alternative. For example, in the section on 
sampling inspection, the producer's fraction defective p was contrasted 
with the consumer's tolerance fraction defective p t on the grounds that 
the consumer's risk would be even smaller than that calculated if the 
fraction defective were smaller than p*. In most applications there 
will be a difference | 0 — 0 O | = A such that it will be profitable to make 
a change from 0 O only if | 0 — 6 0 [ ^ A; consequently, if 0i is selected 
as that alternative value of 0 for which | 0i — 0 O | — A, any alternative 
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satisfying the practical inequality will give rise to smaller type I o 
type II errors than those for 6 X . 

Although the test given by (14) can be applied to numerous types o 
problems, it wall be applied here only to the problem of testing i 
binomial probability. Consider the hypothesis p = Po and the singl 
alternative p = p x . If x = 1 for success and x = 0 for failure, f(x; 6 
will reduce to/(l; p) = p and /(0; p) = q . Now suppose that ther 
are d m successes in the first m trials of the event. Then (13) become 

Vxm = jn da qi m ~ dm 
Pom P0 dm q0 n ~ dm 

If this expression is substituted in (14) and the desired numerica 
values are assigned to p 0 , p x , a, and (3, the test procedure will b 
determined. 

As a numerical illustration, let po = 0.5, p x = 0.7, a = 0.10, an< 
0 — 0.20. These values may be thought of as those that might b 
used to test the honesty of a coin when that coin is suspected of givinj 
too many heads. Here 0/(1 — a) = (1 — ff)/a = 8, and 

Vim _ (0.7) <t "(0.3)"‘~ dw _ /3\ m /7V m 
Pom ~ (0.5)' iro (0.5) m_<im “ \5/ \3/ 


The first inequality in (14), 

vs) vs) — s 

can be written more conveniently in the form 

. ^ log I log § 

d m — ; n + 

log 3 logi 

In a similar manner the second inequality becomes 

^ log 8 , log il 

d m — ] 7 + m :- 7 

log log Ji 

If these logarithms are evaluated, the test will proceed as follows: 


1 . If d m —1.78 + 0.603m, accept p = 0.5. 

2. If d m >: 2.45 + 0.603m, accept p = 0.7. 

3. If neither inequality is satisfied, take another trial. 

Tosses of a coin gave the results shown in the following table. For the 
purpose of determining when one of the inequalities is satisfied, it is 
convenient to represent these inequalities and the results of the succes¬ 
sive trials graphically. If m and d m are treated as the coordinates of t 
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0 
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7 



point, the straight lines d m = —1.78 + 0.603m and d m = 2.45 + 0.603m 
will serve to divide the m, dm plane into three regions corresponding 
to the three possible decisions at each trial. The graph corresponding 



0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 


m 

Fig. 1 . Sequential test for testing p = 0.5 against p == 0.7. 

to this problem is given in Fig. 1. From this graph it will be observed 
that the experiment terminated after 15 trials because inequality 1 
was then satisfied. In accepting the hypothesis that p = 0.5, the 
experimenter does so in preference to accepting the hypothesis that 
p = 0.7. 

If the alternative to p = 0.5 had been p = 0.6, say, a considerably 
larger number of trials would have been required on the average to 
arrive at a decision with these same values of a and /3. By selecting the 
alternative value properly, the experimenter can design his experiment 
in such a manner as to discover profitable differences in p for a minimum 
amount of inspection. 
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EXERCISES 


1. Using the table of binomials referred to in Fry’s book, determine the per¬ 
centage error in replacing the binomial sum given in (4) by 


e 


-np t 


(np t ) x 


for the case in which N *= 50, n 


“Zq 

10, pt =* 0.1, and c 


2 . 
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2 . Using the Poisson approximation 


c 



e~ n9 (np)* 

x\ 


with p replaced by pt and p, respectively, for the probabilities given by (4) and (6), 
verify by trying neighboring values of the variables c and n that the values given 
in the illustrative example for minimum single sampling are approximately correct. 

3. Using the Poisson approximations of the preceding problem, determine by 
numerical methods the values of n and c which minimize the amount of inspection 
for N = 400, P c = 0.10, pt — 0.05, and p = 0.02. Proceed by assigning c a 
value, beginning with 2, then determining the value of n to satisfy (4), and finally 
selecting that pair of values which makes (6) a minimum. 

4 . Derive the sequential test for testing the hypothesis m =» wo against the al¬ 
ternative m * mi for a Poisson distribution. 

5. By the use of Tippett's random sampling numbers draw repeated samples 
from the Poisson population with m ~ 2. Use the test derived in the preceding 
problem on these sample values to test the hypothesis m = 2 against the alterna¬ 
tive m — 3 for a Poisson distribution. 

6. Derive the sequential test for testing the hypothesis m = wo against the 
alternative m *= mi for a normal distribution with known variance a 2 . 

7. How would you proceed if you wished to design an analysis-of-variance ex¬ 
periment for testing the speed of two (or more) different methods of performing a 
job with a certain type of machine. Consider different operators and different 
machines as variables to be incorporated in the design. 
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6.19677 

6.20484 

6.21289 

6.22093 

6.22896 

6.23699 


6.25300 

6.26099 

6.26897 

6.27694 

6.28490 

6.29285 

6.30079 

6.30872 

6.31664 


6.32456 


vlON 
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6.32456 



16.0801 

16.1604 

16.2409 

16.3216 

16.4025 

16.4836 

16.5649 

16.6464 

16.7281 


16.8921 

16.9744 

17.0569 

17.1396 

17.2225 

17.3056 

17.3889 

17.4724 

17.5561 


17.7241 

17.8084 

17.8929 

17.9776 

18.0625 

18.1476 

18.2329 

18.3184 

18.4041 


18.5761 

18.6624 

18.7489 

18.8356 

18.9225 

19.0096 

19.0969 

19.1844 

19.2721 


19.4481 

19.5364 

19.6249 

19.7136 

19.8025 

19.8916 

19.9809 

20.0704 

20.1601 


2.00250 

2.00499 

2.00749 

2.00998 

2.01246 

2.01494 

2.01742 

2.01990 

2.02237 


2.02731 

2.02978 

2.03224 

2.03470 

2.03715 

2.03961 

2.04206 

2.04450 

2.04695 


6.33246 

6.34035 

6.34823 

6.35610 

6.36396 

6.37181 

6.37966 

6.38749 

6.39531 

6.40312 

6.41093 

6.41872 

6.42651 

6.43428 

6.44205 

6.44981 

6.45755 

6.46529 

6.47302 


2.04939 6.48074 


2.05183 

2.05426 

2.05670 

2.05913 

2.06155 

2.06398 

2.06640 

2.06882 

2.07123 


2.07605 

2.07846 

2.08087 

2.08327 

2.08567 

2.08806 

2.09045 

2.09284 

2.09523 


2.10000 

2.10238 

2.10476 

2.10713 

2.10950 

2.11187 

2.11424 

2.11660 

2.11896 


6.48845 

6.49615 

6.50384 

6.51153 
6 51920 
6.52687 

6.53452 

6.54217 

6.54981 


6.56506 

6.57267 

6.58027 

6.58787 

6.59545 

6.60303 

6.61060 

6.61816 

6.62571 


6.64078 

6.64831 

6.65582 

6.66333 

6.67083 

6.67832 

6.68581 

6.69328 

6.70075 


2.12132 6.70820 


W l VN l VlOti 


20.2500 2.12132 6.70820 



















































238 


SQUARES AND SQUARE ROOTS 


N 

W* 

Vn 

vTon 


N 

N’ 

VS 

VfoS I 

&OQ 

25.0000 

2.23607 

7.07107 


5.50 

30.2500 

2.34521 

7.41620 

* 5.01 

25.1001 

2.23830 

7.07814 


5.51 

30.3601 

2.34734 

7.42294 

5.02 

25.2004 

2.24054 

7.08520 


5.52 

30.4704 

2.34947 

7.42967 

5.03 

25.3009 

2.24277 

7.09225 


5.53 

30.5809 

2.35160 

7.43640 

5.04 

25.4016 

2.24499 

7.09930 


5.54 

30.6916 

2.35372 

7.44312 

5.05 

25.5025 

2.24722 

7.10634 


5.55 

30.8025 

2.35584 

7.44983 

5.06 

25.6036 

2.24944 

7.11337 


5.56 

30.9136 

2.35797 

7.45654 

5.07 

25.7049 

2.25167 

7.12039 


5.57 

31.0249 

2.36008 

7.46324 

5.08 

25.8064 

2.25389 

7.12741 


5.58 

31.1364 

2.36220 

7.46994 

5.09 

25.9081 

2.25610 

7.13442 


5.59 

31.2481 

2.36432 

7.47663 

5.10 

26.0100 

2.25832 

7.14143 


5.60 

31.3600 

2.36643 

7.48331 

5.11 

26.1121 

2.26053 

7.14843 


5.61 

31.4721 

2.36854 

7.48999 

5.12 

26.2144 

2.26274 

7.15542 


5.62 

31.5844 

2.37065 

7.49667 

5.13 

26.3169 

2.26495 

7.16240 


5.63 

31.6969 

2.37276 

7.50333 

5.14 

26.4196 

2.26716 

7.16938 


5.64 

31.8096 

2 37487 

7.50999 

5.15 

26.5225 

2.26956 

7.17635 


5.65 

31.9225 

2.37697 

7.51665 

5.16 

26.6256 

2.27156 

7.18331 


5.66 

32.0356 

2.37908 

7.52330 

5.17 

26.7289 

2.27376 

7.19027 


5.67 

32.1489 

2.38118 

7.52994 

5.18 

26.8324 

2.27596 

7.19722 


5.68 

32.2624 

2.38328 

7.53658 

5.19 

26.9361 

2.27816 

7.20417 


5.69 

32.3761 

2 38537 

7.54321 

5.20 


2.28035 

7.21110 


5.70 

32.4900 

2.38747 

7.54983 

5.21 

27.1441 

2.28254 

7.21803 


5.71 

32.6041 

2.38956 

7.55645 

5.22 

27.2484 

2.28473 

7.22496 


5.72 

32.7184 

2.39165 

7.56307 

5.23 

27.3529 

2.28692 

7.23187 


5.73 

32.8329 

2.39374 

7.56968 

5.24 

27.4576 

2.28910 

7.23878 


5.74 

32.9476 

2.39583 

7.57628 

5.25 

27.5625 

2.29129 

7.24569 


5.75 

33.0625 

2.39792 

7.58288 

5.26 

27.6676 

2.29347 

7.25259 


5.76 

33.1776 

2.40000 

7.58947 

5.27 

27.7729 

2.29565 

7.25948 


5.77 

33.2929 

2.40208 ! 

7.59605 

5.28 

27.8784 

2.29783 

7.26636 


5.78 

33.4084 

2.40416 , 

7.60263 

5.29 

27.9841 

2.30000 

! 7.27324 


5.79 

33.5241 

2.40624 

7.60920 

5.30 

28.0900 

2.30217 

7.28011 


5.80 

33.6400 

2.40832 

7.61577 

5.31 

28.1961 

2.30434 

7.28697 


5.81 

33.7561 

2.41039 

7.62234 

5.32 

28.3024 

2.30651 

7.29383 


5.82 

33.8724 

2.41247 

7.62889 

5.33 

28.4089 

2.30868 

7.30068 


5.83 

33.9889 

2.41454 

7.63544 

5.34 

28.5156 

2.31084 

7.30753 


5.84 

34.1056 

2.41661 

7.64199 

5.35 

28.6225 

2.31301 

7.31437 


5.85 

34.2225 

2.41868 

7.64853 

5.36 

28.7296 

2.31517 

7.32120 


5.86 

34.3396 

2.42074 

7.65506 

5.37 

28.8369 

2.31733 

7.32803 


5.87 

34.4569 

2.42281 

7.66159 

5.38 

28.9444 

2.31948 

7.33485 


5.88 

34.5744 

2.42487 

7.66812 

5.39 

29.0521 

2.32164 

7.34166 


5.89 

34.6921 

2.42693 

7.67463 

5.40 


2.32379 

7.34847 


5.90 

34.8100 

BS 

BS 

5.41 

29.2681 

2.32594 

7.35527 


5.91 

34.9281 

2.43105 

7.68765 

5.42 

29.3764 

2.32809 

7.36206 


5.92 

35.0464 

2.43311 

7.69415 

5.43 

29.4849 

2.33024 

7.36885 


5.93 

35.1649 

2.43516 

7.70065 

5.44 

29.5936 

2.33238 

7.37564 


5.94 

35.2836 

2.43721 

7.70714 

5.45 

29.7025 

2.33452 

7.38241 


5.95 

35.4025 

2.43926 

7.71362 

5.46 

29.8116 

2.33666 

7.38918 


5.96 

35.5216 

2.44131 

7.72010 

5.47 

29.9209 

2.33880 

7.39594 


5.97 

35.6409 

2.44336 

7.72658 

5.48 

30.0304 

2.54094 

7.40270 


5.98 

35.7604 

2.44540 

7.73305 

6.49 

30.1401 

2.34307 

7.40945 


5.99 

35.8801 

2.44745 

7.73951 

5.50 

30.2500 

2.34521 

7.41620 


6.00 



7.74597 

N 

N* 

VN 

VloS 


N 


vs 

VioS 
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Bnfi 

** 

VN 


■ 

N 

N* 

VN 

Vtok 

6.00 


2.44949 

7.74597 

1 

6.50 

42.2500 

2.54951 

8.06226 

6,01 

36.1201 

2.45153 

7.75242 

■ 

6.51 

42.3801 

2.55147 

8.06846 

6.02 

36.2404 

2.45357 

7.75887 

■ 

6.52 

42.5104 

2.55343 

8.07465 

6.03 

36.3609 

2.45561 

7.76531 

■ 

6.53 

42.6409 

2.55539 

8.08084 

6.04 

36.4816 

2.45764 

7.77174 

H 

6.54 

42.7716 

2.55734 

8.08703 

6.05 

36.6025 

2.45967 

7.77817 


6.55 

42.9025 

2.55930 

8.09321 

6.06 

36.7236 

2.46171 

7.78460 

■ 

6.56 

43.0336 

2.56125 

8.09938 

6.07 

36.8449 

2.46374 

7.79102 

H 

6.57 

43.1649 

2.56320 

8.10555 

6.08 

36.9664 

2.46577 

7.79744 

■ 

6.58 

43.2964 

2.56515 

8.11172 

6.09 

37.0881 

2.46779 

7.80385 

■ 

6.59 

43.4281 

2.56710 

8.11788 

6.10 

37.2100 

2.46982 

7.81025 

■ 

0.60 

43.5600 

2.56905 

8.12404 

6.11 

37.3321 

2.47184 

7.81665 

■ 

6.61 

43.6921 

2.57099 

8.13019 

6.12 

37.4544 

2.47386 

7.82304 

■ 

6.62 

43.8244 

2.57294 

8.13634 

6.13 

37.5769 

2.47588 

7.82943 

■ 

6.63 

43.9569 

2.57488 

8.14248 

6.14 

37.6996 

2.47790 

7.83582 

■ 

6.64 

44.0896 

2.57682 

8.14862 

6.15 

37.8225 

2.47992 

7.84219 

■ 

6.65 

44.2225 

2 57876 

8.15475 

6.16 

37.9456 

2.48193 

7.84857 

■ 

6.66 

44.3556 

2.58070 

8.16088 

6.17 

38.0689 

2.48395 

7.85493 

■ 

6.67 

44.4889 

2.58263 

8.16701 

6.18 

38.1924 

2.48596 

7.86130 

■ 

6.68 

44.6224 

2.58457 

8.17313 

6.19 

38.3161 

2.48797 

7.86766 

■ 

6.69 

44.7561 

2.58650 

8.17924 

mi 

38.4400 

2.48998 

7.87401 

■ 

6.70 

44.8900 

2.58844 

8.18535 

6.21 

38.5641 

2.49199 

7.88036 

■ 

6.71 

45.0241 

2.59037 

8.19146 

6.22 

38.6884 

2.49399 

7.88670 

■ 

6.72 

45.1584 

2.59230 

8.19756 

6.23 

38.8129 

2.49600 

7.89303 

■ 

6.73 

45.2929 

2.59422 

8.20366 

6.24 

38.9376 

2.49800 

7.89937 

■ 

6.74 

45.4276 

2.59615 

8.20975 

6.25 

39.0625 

2.50000 

7.90569 

■ 

6.75 

45.5625 

2.59808 

8.21584 

6.26 

39.1876 

2.50200 

7.91202 

■ 

6.76 

45.6976 ! 

2.60000 

8.22192 

6.27 

39.3129 

2.50400 

7.91833 

■ 

6.77 

45.8329 

2.60192 

8.22800 

6.28 

39.4384 

2.50599 

7.92465 

■ 

6.78 

45.9684 

2 60384 

8.23408 

6.29 

39.5641 

2.50799 

7.93095 

H 

6.79 

46.1041 

2.60576 

8.24015 

6.30 

39.6900 

2.50998 

7.93725 

I 

6.80 

46.2400 

2.60768 

8.24621 

6.31 

39.8161 

2.51197 

7.94355 

■ 

6.81 

46.3761 

2.60960 

8.25227 

6.32 

39.9424 

2.51396 

7.94984 

■ 

6.82 

46.5124 | 

2.61151 

8.25833 

6.33 

40.0689 

2.51595 

7.95613 

■ 

6.83 

46.6489 | 

2.61343 

8.26438 

6.34 

40.1956 

2.51794 

7.96241 

1 

6.84 

46.7856 

2.61534 

8.27043 

6.35 

40.3225 

2.51992 

7.96869 

■ 

6.85 

46.9225 

2.61725 

8.27647 

6.36 

40.4496 

2.52190 

7.97496 

■ 

6.86 

47.0596 

2.61916 

8.28251 

6.37 

40.5769 

2.52389 

7.98123 

■ 

6.87 

47.1969 

2.62107 

8.28855 

6.38 

40.7044 

2.52587 

7.98749 

■ 

6.88 

47.3344 

2.62298 

8.29458 

6.39 

40.8321 

2.52784 

7.99375 

■ 

6.89 

47.4721 ; 

2.62488 

8.30060 

6.40 

40.9600 

2.52982 

8.00000 

■ 

6.90 

47.6100 

2.62679 

8.30662 

6.41 

41.0881 

2.53180 

8.00625 

1 

6.91 

47.7481 

2.62869 

8.31264 

6.42 

41.2164 

2.53377 

8.01249 

■ 

6.92 

47.8864 

2.63059 

8.31865 

6.43 

41.3449 

2.53574 

8.01873 

■ 

6.93 

48.0249 

2.63249 

8.32466 

6.44 

41.4736 

2.53772 

8.02496 

■ 

6.94 

48.1636 

2.63439 

8.33067 

6.45 

41.6025 

2.53969 

8.03119 

■ 

6.95 

48.3025 

2.63629 

8.33667 

6.46 

41.7316 

2.54165 

8.03741 

■ 

6.96 

48.4416 

2.63818 

8.34266 

6.47 

41.8609 

2.54362 

8.04363 

■ 

6.97 

48.5809 

2.64008 

8.34865 

6.48 

41.9904 

2.54558 

8.04984 

■ 

6.98 

48.7204 

2.64197 

8.35464 

6.49 

42.1201 

2.54755 

8.05605 

H 

6.99 

48.8601 

2.64386 

8.36062 

6.50 

42.2500 

2.54951 

8.06226 

■ 

7.00 

49.0000 

2.64575 

8.36660 

N 

N* 

Vw 

VlON 

1 

N 

N* 

\/N 

ViON 
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N 

1 

Vn 

VioN 

■ 


If* 

VN 

r v'lON'l 

7.00 


2.64575 

8.36660 

■ 

7.50 

56.2500 

2.73861 

8.66025 

7.01 

49.1401 

2.64764 

8.37257 

■ 

7.51 

56.4001 

2.74044 

8.66603 

7.02 

49.2804 

2.64953 

8.37854 

■ 

7.52 

56.5504 

2.74226 

8.67179 

7.03 

49.4209 

2.65141 

8.38451 

■ 

7.53 

56.7009 

2.74408 

8.67756 

7.04 

49.5616 

2.65330 

8.39047 

■ 

7.54 

56.8516 

2.74591 

8.68332 

7.05 

49.7025 

2.65518 

8.39643 


7.55 

57.0025 

2.74773 

8.68907 

7.06 

49.8436 

2.65707 

8.40238 


7.56 

57.1536 

2.74955 

8.69483 

7.07 

49.9849 

2.65895 

8.40833 

■ 

7.57 

57.3049 

2.75136 

8.70057 

7.08 

50.1264 

2.66083 

8.41427 

- - 

7.58 

57.4564 

2.75318 

8.70632 

7.09 

50.2681 

2.66271 

8.42021 

1 

7.59 

57.6081 

2.75500 

8.71206 

7.10 

50.4100 

2.66458 

8.42615 

■ 

7.60 

57.7600 

2.75681 

8.71780 

7.11 

50.5521 

2.66646 

8.43208 

| 

7.61 

57.9121 

2.75862 

8.72353 

7.12 

50.6944 

2.66833 

8.43801 

■ 

7.62 

58.0644 

2.76043 

8.72926 

7.13 

50.8369 

2.67021 

8.44393 

B 

7.63 

58.2169 

2.76225 

8.73499 

7.14 

50.9796 

2.67208 

8.44985 

B 

7.64 

58.3696 

2.76405 

8.74071 

7.15 

51.1225 

2.67395 

8.45577 

■ 

7.65 

58.5225 

2.76586 

8.74643 

7.16 

51.2656 

2.67582 

8.46168 

B 

7.66 

58.6756 

2.76767 

8.75214 

7.17 

51.4089 

2.67769 

8.46759 

B 

7.67 

58.8289 

2.76948 

8.75785 

7.18 

51.5524 

2.67955 

8.47349 

H 

7.68 

58.9824 

2.77128 

8.76356 

7.19 

51.6961 

2.68142 

8.47939 

B 

7.69 

59.1361 

2.77308 

8.76926 

7.20 

51.8400 

2.68328 

8.48528 

B 

7.70 

59.2900 

2.77489 

8.77496 

7.21 

51.9841 

2.68514 

8.49117 

B 

7.71 

59.4441 

2.77669 

8.78066 

7.22 

52.1284 

2.68701 

8.49706 

■ 

7.72 

59.5984 

2.77849 

8.78635 

7.23 

52.2729 

2.68887 

8.50294 

B 

7.73 

59.7529 

2.78029 

8.79204 

7.24 

52.4176 

2.69072 

8.50882 

B 

7.74 

59.9076 

2.78209 

8.79773 

7.25 

52.5625 

2.69258 

8.51469 

■ 

7.75 

60.0625 

2.78388 

8.80341 

7.26 

52.7076 

2.69444 

8.52056 

B 

7.76 

60.2176 

2.78568 

8.80909 

7.27 

52.8529 

2.69629 

8.52643 

B 

7.77 

60.3729 

2.78747 

8.81476 

7.28 

52.9984 

2.69815 

8.53229 


7.78 

60.5284 

2.78927 

8.82043 

7.29 

53.1441 

2.70000 

8.53815 

B 

7.79 

60.6841 | 

2.79106 

8.82610 

7.30 

53.2900 

2.70185 

8.54400 

■ 

7.80 

60.8400 

2.79285 

8.83176 

7.31 

53.4361 

2.70370 

8.54985 

B 

7.81 

60.9961 

2.79464 

8.83742 

7.32 

53.5824 

2.70555 

8.55570 

■ 

7.82 

61.1524 ! 

2.79643 

8.84308 

7.33 

53.7289 

| 2.70740 

8.56154 

B 

7.83 

61.3089 

2.79821 

8.84873 

7.34 

53.8756 

1 2.70924 

8.56738 

B 

7.84 

61.4656 

2.80000 

8.85438 

7.35 

54.0225 

2.71109 

8.57321 

■ 

7.85 

61.6225 

2.80179 

8.86002 

7.36 

54.1696 

2.71293 

8.57904 

B 

7.86 

61.7796 

2.80357 

8.86566 

7.37 

54.3169 

2.71477 

8.58487 

B 

7.87 

61.9369 

2.80535 

8.87130 

7.38 

54.4644 

2.71662 

8.59069 

■ 

7.88 

62.0944 

2.80713 

8.87694 

7.39 

54.6121 

2.71846 

8.59651 

B 

7.89 

62.2521 

2.80891 

8.88257 

7.40 

54.7600 

2.72029 

8.60233 

B 

7.90 

62.4100 

2.81069 

8.88819 

prm 

54.9081 

2.72213 

8.60814 

B 

7.91 

62.5681 

2.81247 

8.89382 

Kj \ 

55.0564 

2.72397 

8.61394 

■ 

7.92 

62.7264 

2.81425 

8.89944 

mjL'M 

55.2049 

1 2.72580 

8.61974 

B 

7.93 

62.8849 

2.81603 

8.90505 

7.44 

55.3536 

2.72764 

8.62554 

B 

7.94 

63.0436 

2.81780 

8.91067 

7.45 

55.5025 

2.72947 

8.63134 

■ 

7.95 

63.2025 

2.81957 

8.91628 

7.46 

55.6516 

2.73130 

8.63713 

B 

7.96 

63.3616 

2.82135 

8.92188 

7.47 

55.8009 

2.73313 

8.64292 

fl 

7.97 

63.5209 1 

2.82312 

8.92749 

7.48 

55.9504 

2.73496 

8.64870 

■ 

7.98 

63.6804 

2.82489 

8.93308 

7.49 

56.1001 

2.73679 

8.65448 

B 

7.99 

63.8401 

2.82666 

8.93868 

7.50 


2.73861 



8.00 


2.82843 

8.94427 

N 

If* 

VN 

VlON | 

■ 

■ 

N 

N * 

VS 

Violf 







































SQUARES AND SQUARE ROOTS 
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64.1601 

64.3204 

64.4809 

64.6416 

64.8025 

64.9636 

65.1249 

65.2864 

65.4481 

65.6100 


65.7721 

65.9344 

66.0969 

66.2596 

66.4225 

66.5856 

66.7489 

66.9124 

67.0761 


67.4041 

67.5684 

67.7329 

67.8976 

68.0625 

68.2276 

68.3929 

68.5584 

68.7241 


2.83019 

2.83196 

2.83373 

2.83549 

2.83725 

2.83901 

2.84077 

2.84253 

2.84429 


2.84781 

2.84956 

2.85132 

2.85307 

2.85482 

2.85657 

2.85832 

2.86007 

2.86182 


2.86531 

2.86705 

2.86880 

2.87054 

2.87228 

2.87402 

2.87576 

2.87750 

2.87924 


8.94986 

8.95545 

8.96103 

8.96660 

8.97218 

8.97775 

8.98332 

8.98888 

8.99444 


9.00555 

9.01110 

9.01665 

9.02219 

9.02774 

9.03327 

9.03881 

9.04434 

9.04986 


9.06091 

9.06642 

9.07193 

9.07744 

9.08295 

9.08845 

9.09395 

9.09945 

9.10494 


69.0561 

69.2224 

69.3889 

69.5556 

69.7225 

69.8896 

70.0569 

70.2244 

70.3921 

2.88271 

2.88444 

2.88617 

2.88791 

2.88964 

2.89137 

2.89310 

2.89482 

2.89655 

9.11592 

9.12140 

9.12688 

9.13236 

9.13783 

9.14330 

9.14877 

9.15423 

9.15969 

70.5600 

2.89828 

9.16515 

70.7281 

70.8964 

71.0649 

71.2336 

71.4025 

71.5716 

71.7409 

71.9104 

72.0801 

2.90000 

2.90172 

2.90345 

2.90517 

2.90689 

2.90861 

2.91033 

2.91204 

2.91376 

9.17061 

9.17606 

9.18150 

9.18695 

9.19239 

9.19783 

9.20326 

9.20869 

9.21412 

72.2500 

2.91548 

9.21954 

w * 

mm 

Won 


VlON 


9.21954 



75.8641 

76.0384 

76.2129 

76.3876 

76.5625 

76.7376 

76.9129 

77.0884 

77.2641 


77.6161 

77.7924 

77.9689 

78.1456 

78.3225 

78.4996 

78.6769 

78.8544 

79.0321 


79.3881 

79.5664 

79.7449 

79.9236 

80.1025 

80.2816 

80.4609 

80.6404 

80.8201 


2.95127 

2.95296 

2.95466 

2.95635 

2.95804 

2.95973 

2.96142 

2.96311 

2.96479 


2.96816 

2.96985 

2.97153 

2.97321 

2.97489 

2.97658 

2.97825 

2.97993 

2.98161 


2.98496 

2.98664 

2.98831 

2.98998 

2.99166 

2.99333 

2.99500 

2.99666 

2.99833 



9.33274 

9.33809 

9.34345 

9.34880 

9.35414 

9.35949 

9.36483 

9.37017 

9.37550 


9.38616 

9.39149 

9.39681 

9.40213 

9.40744 

9.41276 

9.41807 

9.42338 

9.42868 


9.43928 

9.44458 

9.44987 

9.45516 

9.46044 

9.46573 

9.47101 

9.47629 

9.48156 


9.48683 


V 10 N 
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SQUARES AND SQUARE ROOTS 


Inr 

N* 

VN 

VlON 

■ 

N 

N2 

\/N 

VlON 

ESS 

81.0000 

3.00000 

9.48683 

■ 

9.50 

90.2500 

3.08221 

9.74679 

9.01 

81.1801 

3.00167 

9.49210 

■ 

9.51 

90.4401 

3.08383 

9.75192 

9.02 

81.3604 

3.00333 

9.49737 


9.52 

90.6304 

3.08545 

9.75705 

9.03 

81.5409 

3.00500 

9.50263 

■ 

9.53 

90.8209 

3.08707 

9.76217 

9.04 

81.7216 

3.00666 

9.50789 

■ 

9.54 

91.0116 

3.08869 

9.76729 

9.05 

81.9025 

3.00832 

9.51315 

■ 

9.55 

91.2025 

3.09031 

9.77241 

9.06 

82.0836 

3.00998 

9.51840 

■ 

9.56 

91.3936 

3.09192 

9.77753 

9.07 

82.2649 

3.01164 

9.52365 

1 

9.57 

91.5849 

3.09354 

9.78264 

9.08 

82.4464 

3.01330 

9.52890 

■ 

9.58 

91.7764 

3.09516 

9.78776 

9.09 

82.6281 

3.01496 

9.53415 

■ 

9.59 

91.9681 

3.09677 

9.79285 

9.10 

82.8100 

3.01662 

9.53939 

■ 

9.60 

92.1600 

3.09839 

9.79796 

9.11 

82.9921 

3.01828 

9.54463 

■ 

9.61 

92.3521 

3.10000 

9.80306 

9.12 

83.1744 

3.01993 

9.54987 

■ 

9.62 

92.5444 

3.10161 

9.80816 

9.13 

83.3569 

3.02159 

9.55510 

1 

9.63 

92.7369 

3.10322 

9.81326 

9.14 

83.5396 

3.02324 

9.56033 

1 

9.64 

92.9296 

3.10483 

9.81835 

9.15 

83.7225 

3.02490 

9.56556 

H 

9.65 

93.1225 

3.10644 

9.82344 

9.16 

83.9056 

3.02655 

9.57079 

■ 

9.66 

93.3156 

3.10805 

9.82853 

9.17 

84.0889 

3.02820 

9.57601 

1 

9.67 

93.5089 

3.10966 

9.83362 

9.18 

84.2724 

3.02985 

9.58123 

■ 

9.68 

93.7024 

3.11127 

9.83870 

9.19 

84.4561 

3.03150 

9.58645 

■ 

9.69 

93.8961 

3.11288 

9.84378 

9.20 

84.6400 

3.03315 

9.59166 

I 

9.70 

94.0900 

3.11448 

9.84886 

9.21 

84.8241 

3.03480 

9.59687 

■ 

9.71 

94.2841 

3.11609 

9.85393 

9.22 

85.0084 

3.03645 

9.60208 


9.72 

94.4784 

3.11769 

9.85901 

9.23 

85.1929 

3.03809 

9.60729 


9.73 

94.6729 

3.11929 

9.86408 

9.24 

85.3776 

3.03974 

9.61249 

1 

9.74 

94.8676 

3.12090 

9.86914 

9.25 

85.5625 

3.04138 

9.61769 


9.75 

95.0625 

3.12250 

9.87421 

9.26 

85.7476 

3.04302 

9.62289 

■ 

9.76 

95.2576 

3.12410 

9.87927 

9.27 

85.9329 

3.04467 

9.62808 

■ 

9.77 

95.4529 

3.12570 

9.88433 

9.28 

86.1184 

3.04631 

9.63328 

■ 

9.78 

95.6484 

3.12730 

9.88939 

9.29 

86.3041 

3.04795 

9.63846 

■ 

9.79 

95.8441 

3.12890 

9.89444 

9.80 

86.4900 

3.04959 

9.64365 

■ 

9.80 

96.0400 

3.13050 

9.89949 

9.31 

86.6761 

3.05123 

9.64883 

1 

9.81 

96.2361 

3.13209 

9.90454 

9.32 

86.8624 

3.05287 

9.65401 

■ 

9.82 

96.4324 

3.13369 

9.90959 

9.33 

87.0489 

3.05450 

9.65919 

■ 

9.83 

96.6289 

3.13528 

9.91464 

9.34 

87.2356 

3.05614 

9.66437 

1 

9.84 

96.8256 

3.13688 

9.91968 

9.35 

87.4225 

3.05778 

9.66954 

■ 

9.85 

97.0225 

3.13847 

9.92472 

9.36 

87.6096 

3.05941 

9.67471 

■ 

9.86 

97.2196 

3.14006 

9.92975 

9.37 

87.7969 

3.06105 

9.67988 

■ 

9.87 

97.4169 

3.14166 

9.93479 

9.38 

87.9844 

3.06268 

9.68504 

■ 

9.88 

97.6144 

3.14325 

9.93982 

9.39 

88.1721 

3.06431 

9.69020 

■ 

9.89 

97.8121 

3.14484 

9.94485 

9.40 

88.3600 

3.06594 

9.69536 

■ 

9.90 

98.0100 

^ 3.14643 

9.94987 

9.41 

88.5481 

3.06757 

9.70052 

■ 

9.91 

98.2081 

3.14802 

9.95490 

9.42 

88.7364 

3.06920 

9.70567 


9.92 

98.4064 

3.14960 

9.95992 

9.43 

88.9249 

3.07083 

9.71082 

■ 

9.93 

98.6049 

i 3.15119 

9.96494 

9.44 

89.1136 

3.07246 

9.71597 

B 

9.94 

98.8036 

3.15278 

9.96995 

9.45 

89.3025 

3.07409 

9.72111 


9.95 

99.0025 

3.15436 

9.97497 

9.46 

89.4916 

3.07571 

9.72625 

B 

9.96 

99.2016 

3.15595 

9.97998 

9.47 

89.6809 

3.07734 

9.73139 

B 

9.97 

99.4009 

3.15753 

9.98499 

9.48 

89.8704 

3.07896 

9.73653 

■ 

9.98 

99.6004 

3.15911 

9.98999 

9.49 

90.0601 

3.08058 

9.74166 

B 

9.99 

99.8001 

3.16070 

9.99500 

9.50 

90.2500 

3.08221 

9.74679 

B 

10.00 

100.0001, 

,, 3.16228 

10.0000 

N 

ir* 

VN 

Viow 

B 

N 

N* 


Violf 
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* 

<>(0 


t 

4>(0 


t 

<*>(«) 


mm 

39894 

■PHj 

.45 


.17364 

.90 

.26609 

.31594 

.01 

.39892 

.00399 

.46 

.35889 

.17724 

.91 

26369 

.31859 


.39886 

■Kl 

.47 

.35723 

.18082 

.92 

.26129 

.32121 

wSmm 

.39876 

.01197 

.48 

.35553 

.18439 

.93 

.25888 

.32381 

■a 

.39862 


.49 

.35381 

.18793 

.94 

.25647 

.32639 

1 

.39844 


.50 

1.1 ! 

.19146 

.95 

.25406 

.32894 

.06 

.39822 

02392 

.51 

.35029 

.19497 

.96 

.25164 

.33147 


.39797 

.02790 

.52 

.34849 

.19847 

.97 

.24923 

.33398 

.08 

.39767 

03188 

.53 

.34667 


.98 

.24681 

.33646 

.09 

.39733 

.03586 

.54 

.34482 


.99 

.24439 

.33891 


.39695 

.03983 

.55 

.34294 

20884 

1 00 

.24197 

.34134 

.11 

39654 

.04380 

.56 

.34105 

.21226 

1.01 

23955 

.34375 

.12 

39608 

04776 

.57 

.33912 

.21566 

1 02 

.23713 

.34614 

.13 

.39559 

05172 

.58 

.33718 

.21904 

1 03 

.23471 

.34850 

.14 

.39505 


.59 

.33521 

.22240 

1.04 

.23230 

.35083 

.15 

.39448 

.05962 


.33322 

.22575 

1 05 

.22988 

.35314 

.16 

.39387 

.06356 

.61 

.33121 

.22907 

1 06 

22747 

.35543 

.17 

.39322 

.06749 

.62 

.32918 

.23237 

1.07 

.22506 

.35769 

.18 

.39253 

07142 

.63 

.32713 

.23565 

1 08 

.22265 

.35993 

.19 

.39181 

.07535 

.64 


.23891 

1 09 

.22025 

.36214 

.20 

.39104 

07926 

.65 

32297 

24215 

1.10 

.21785 

.36433 

.21 

.39024 

.08317 

.66 


24537 

1 11 

21546 

.36650 

.22 

38940 

08706 

.67 

.31874 

24857 

1 12 

21307 

.36864 

.23 

.38853 

■HWlWiil 

.68 

.31659 

.25175 

1 13 

21069 

.37076 

.24 

.38762 

.09483 

.69 

.31443 

.25490 

1.14 

.20831 

.37286 

.25 

.38667 


.70 

.31225 

.25804 

1.15 

.20594 

.37493 

.26 

.38568 

.10257 

.71 

KULLsl 

.26115 

1.16 

.20357 

.37698 

.27 

38466 

.10642 

.72 

.30785 

.26424 

1.17 

.20121 

.37900 

.28 

.38361 

.11026 

.73 

30563 

.26730 

1.18 

19886 

.38100 

.29 

.38251 


.74 

30339 

.27035 

1.19 

.19652 

.38298 

.30 

.38139 

.11791 

.75 

.30114 

.27337 

1 20 

.19419 

.38493 

.31 

.38023 

.12172 

.76 

29887 

.27637 

1.21 

.19186 

.38686 

.32 

.37903 

.12552 

.77 

.29659 

.27935 

1 22 

18954 

.38877 

.33 

.37780 

.12930 

.78 

.29431 

.28230 

1.23 

.18724 

.39065 

.34 

.37654 

.13307 

.79 


.28524 

1 24 

.18494 

.39251 

.35 

.37524 

.13683 

■EiB 

.28069 

.2S814 

1.25 

i .18265 

.39435 

.36 

.37391 

.14058 

.81 

.28737 

.29103 

1.26 

.18037 

.39617 

.37 

.37255 

.14431 

.82 

.28504 

.29389 

1 27 

.17810 

.39796 

.38 

.37115 

.14803 

.83 

.28269 

.29673 

1 28 

.17585 

.39973 

.39 

.36973 

.15173 

.84 

.28034 

.29955 

1.29 

.17360 

.40147 

.40 

.36827 

.15542 

.85 

.27798 

9 

1.30 

.17137 

.40320 

.41 

.36678 

.15910 

.86 

.27562 

ma 

1.31 

.16915 

.40490 

.42 

.36526 

.16276 

.87 

.27324 

.30785 

1.32 

.16694 

.40658 

.43 

.36371 


.88 

KKI 

Kufiam 

1.33 

.16474 

.40824 

.44 

.36213 

Rjgg 

.89 

.26848 

.31327 

1.34 

.16256 

.40988 


* Reprinted, by permission, from Kenney, Mathematics of Statistics, Part One, pp. 225-227, D 
Van Nostrand, New York. 
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NORMAL AREAS AND ORDINATES 


t 



t 

4>{l) 


n 

*{ t ) 


1.35 

.16038 

.41149 

1.80 

.07895 

.46407 

2.25 

.03174 

.48778 

1.36 

.15822 

.41309 

1.81 

.07754 

.46485 

2.26 

.03103 

.48809 

1.37 

.15608 

.41466 

1.82 

.07614 

.46562 

2.27 

HI 

.48840 

1.38 

.15395 

.41621 

1.83 

.07477 

.46638 

2.28 

.02965 

RM!l 

1.39 

.15183 

.41774 

1.84 

.07341 

.46712 

2.29 

.02898 

.48899 

1.40 

.14973 

.41924 

1.85 

.07206 

.46784 

2.30 

.02833 

.48928 

1.41 

.14764 

.42073 

1.86 

.07074 

.46856 

2.31 

.02768 

.48956 

1.42 

.14556 

.42220 

1.87 

.06943 

.46926 

2.32 

.02705 

.48983 

1.43 

.14350 

.42364 

1.88 

.06814 

.46995 

2.33 

.02643 

fllE&lOttfl 

1.44 

.14146 

.42507 

1.89 

.06687 

,47062 

2.34 

.02582 


1.45 

.13943 

.42647 

1.90 

.06562 

.47128 

2.35 

.02522 

1 

1.46 

.13742 

.42786 

1.91 

.06439 

.47193 

2.36 

.02463 


1.47 

.13542 

.42922 

1.92 

.06316 

.47257 

2.37 

.02406 

.49111 

1.48 

.13344 

.43056 

1.93 

.06195 

.47320 

2.38 

.02349 

.49134 

1.49 

.13147 

.43189 

1.94 

.06077 

.47381 

2.39 

.02294 

.49158 

1.50 

.12952 

.43319 

1.95 

.05959 

.47441 

2.40 

.02239 

.49180 

1.51 

.12758 

.43448 

1.96 

.05844 

.47500 

2.41 

.02186 

.49202 

1.52 

.12566 

.43574 

1.97 

.05730 

.47558 

2.42 

.02134 

.49224 

1.53 

.12376 

.43699 

1.98 

.05618 

.47615 

2.43 

.02083 

.49245 

1.54 

.12188 

.43822 

1.99 

.05508 

.47670 

2.44 

.02033 

.49266 

1.55 

.12001 

.43943 

2.00 

.05399 

.47725 

2.45 

.01984 

.49286 

1.56 

.11816 

.44062 

2.01 

.05292 

.47778 

2.46 

.01936 

.49305 

1.57 

.11632 

.44179 

2.02 

.05186 

.47831 

2.47 

.01889 

.49324 

1.58 

.11450 

.44295 

2.03 

.05082 

.47882 

2.48 

.01842 

.49343 

1.59 

.11270 

.44408 

2.04 

.04980 

.47932 

2.49 

.01797 

.49361 

1.60 

.11092 

.44520 

2.05 

.04879 

.47982 

2.50 

.01753 

.49379 

1.61 

.10915 

.44630 

2.06 

.04780 

.48030 

2.51 

.01709 

.49396 

1.62 

.10741 

.44738 

2.07 

.04682 

.48077 

2.52 

.01667 

.49413 

1.63 

.10567 

.44845 

2.08 

.04586 

.48124 

2.53 

.01625 

.49430 

1.64 

.10396 

.44950 

2.09 

.04491 

.48169 

2.54 

.01585 

.49446 

1.65 

.10226 

.45053 

2.10 

.04398 

.48214 

2.55 

.01545 

.49461 

1.66 

.10059 

.45154 

2.11 

.04307 

.48257 

2.56 

.01506 

.49477 

1.67 

.09893 

.45254 

2.12 

.04217 

.48300 

2.57 

.01468 

.49492 

1.68 

.09728 

.45352 

2.13 

.04128 

.48341 

2.58 

.01431 

.49506 

1.69 

.09566 

.45449 

2.14 

.04041 

.48382 

2.59 

.01394 

.49520 

1.70 

.09405 

.45543 

2.15 

.03955 

.48422 

2 60 

.01358 

.49534 

1.71 

.09246 

.45637 

2.16 

.03871 

.48461 

2.61 

.01323 

.49547 

1.72 

.09089 

.45728 

2.17 

.03788 

.48500 

2.62 

Msmm 

.49560 

1.73 

.08933 

.45818 

2.18 

.03706 

.48537 

2.63 

.01256 

.49573 

1.74 

.08780 

.45907 

2.19 

.03626 

.48574 

2.64 

.01223 

.49585 

1.75 

.08628 

.45994 

2.20 

.03547 

.48610 

2.65 

.01191 

.49598 

1.76 

.08478 

.46080 

2.21 

.03470 

.48645 

2.66 

.01160 

KK1 

1.77 

.08329 

.46164 

2.22 

.03394 

.48679 

2.67 

.01130 

.49621 

1.78 

.08183 

.46246 

2.23 

.03319 

.48713 

2.68 

.01100 

Km 

1.79 

.08038 

.46327 

2.24 

.03246 

.48745 

2.69 

.01071 

.49643 
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t 



t 

*(t) 

So 4>(t)dl 

t 

*«) 

SoWQdt 

2.70 

.01042 

.49653 

3.15 

.00279 

.49918 

3.60 

.00061 

.49984 

2.71 

.01014 

.49664 

3.16 

.00271 

.49921 

3.61 

.00059 

.49985 

2.72 

.00987 

.49674 

3.17 

.00262 

.49924 

3.62 

.00057 

.49985 

2.73 

.00961 

.49683 

3.18 

.00254 

.49926 

3.63 

.00055 

.49986 

2.74 

.00935 

.49693 

3.19 

.00246 

.49929 

3.64 

.00053 

.49986 

2.75 

.00909 

.49702 

3.20 

.00238 

.49931 

3.65 

.00051 

.49987 

2.76 

.00885 

.49711 

3.21 

.00231 

.49934 

3.66 

.00049 

.49987 

2.77 

00861 

.49720 

3.22 

.00224 

.49936 

3.67 

.00047 

.49988 

2.78 

.00837 

.49728 

3.23 

.00216 

.49938 

3.68 

.00046 

.49988 

2.79 

.00814 

.49736 

3.24 

.00210 

.49940 

3.69 

.00044 

.49989 

2.80 

.00792 

.49744 

3.25 

.00203 

.49942 

3.70 

.00042 

.49989 

2.81 

.00770 

.49752 

3.26 

.00196 

.49944 

3.71 

.00041 

.49990 

2.82 

.00748 

.49760 

3.27 

.00190 

.49946 

3.72 

.00039 

.49990 

2.83 

.00727 

.49767 

3.28 

.00184 

.49948 

3.73 

.00038 

.49990 

2.84 

.00707 

.49774 

3.29 

.00178 

.49950 

3.74 

.00037 

.49991 

2.85 

.00687 

.49781 

3.30 

.00172 

.49952 

3.75 

.00035 

.49991 

2.86 

.00668 

.49788 

3.31 

.00167 

.49953 

3.76 

.00034 

.49992 

2.87 

.00649 

.49795 

3.32 

.00161 

.49955 

3.77 

.00033 

.49992 

2 88 

.00631 

.49801 

3.33 

.00156 

.49957 

3.78 

.00031 

.49992 

2.89 

.00613 

.49807 

3.34 

.00151 

.49958 

3.79 

.00030 

.49992 

2.90 

.00595 

.49813 

3.35 

.00146 

.49960 

3.80 

.00029 

.49993 

2.91 

.00578 

.49819 

3.36 

.00141 

49961 

3.81 

.00028 

.49993 

2.92 

.00562 

.49825 

3.37 

.00136 

.49962 

3.82 

.00027 

.49993 

2.93 

.00545 

.49831 

3.38 

.00132 

.49964 

3.83 

.00026 

.49994 

2.94 

.00530 

.49836 

3.39 

.00127 

.49965 

3.84 

.00025 

.49994 

2.95 

.00514 

.49841 

3.40 

.00123 

.49966 

3.85 

.00024 

.49994 

2.96 

.00499 

.49846 

3.41 

.00119 

.49968 

3.86 

.00023 

.49994 

2.97 

.00485 

.49851 

3.42 

.00115 

49969 

3.87 

.00022 

.49995 

2.98 

.00471 

.49856 

3.43 

.00111 

.49970 

3.88 

.00021 

.49995 

2.99 

.00457 

.49861 

3.44 

.00107 

.49971 

3.89 

.00021 

.49995 

3.00 

.00443 

.49865 

3.45 

.00104 

49972 

3.90 

.00020 

.49995 

3.01 

00430 

.49869 

3.46 

.00100 

.49973 

3.91 

.00019 

.49995 

3.02 

.00417 

.49874 

3.47 

00097 

.49974 

3 92 

.00018 

.49996 

3.03 

00405 

.49878 

3.48 

.00094 

.49975 

3.93 

.00018 

.49996 

3.04 

.00393 j 

.49882 

3.49 

.00090 

.49976 

3.94 

.00017 

.49996 

3.05 

.00381 

.49886 

3.50 

.00087 

.49977 

3.95 

.00016 

.49996 

3.06 

.00370 

.49889 

3.51 

.00084 

.49978 

3.96 

.00016 

.49996 

3.07 

.00358 

.49893 

3.52 

.00081 

.49978 

3.97 

.00015 

.49996 

3.08 

.00348 

.49897 

3.53 

.00079 

.49979 

3.98 

.00014 

.49997 

3.09 

.00337 

.49900 

3.54 

.00076 

.49980 

3.99 

.00014 

.49997 

3.10 

.00327 

.49903 

3.55 

.00073 

.49981 




3.11 

.00317 

.49906 

3.56 

.00071 

.49981 




3.12 

.00307 

.49910 

3.57 

.00068 

.49982 




3.13 

.00298 

.49913 

3.58 

.00066 

.49983 




3.14 

■ 

.00288 

.49916 

3.59 

.00063 

.49983 
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TABLE IV—STUDENT'S t DISTRIBUTION * 


Degrees 

of 

freedom n 

Probability of a deviation greater than t 



.025 

.05 

.1 

.15 

1 

63 657 

31.821 

12.706 

6 314 

3 

078 

1.963 

2 

9.925 

6 965 

4.303 

2 920 

1 

886 

1.386 

3 

5 841 

4 541 

3.182 

2 353 

1 

638 

1 250 

4 

4 604 

3 747 

2.776 

2 132 

1 

.533 

1.190 

5 

4 032 

3 365 

2 571 

2.015 

1 

.476 

1.156 

6 

3 707 

3.143 

2.447 

1 943 

1 

.440 

1.134 

7 

3 499 

2 998 

2.365 

1 895 

1 

415 

1 119 

8 

3 355 

2 896 

2.306 

1 860 

1 

397 

1.108 

9 

3.250 

2 821 

2 262 

1 833 

1 

383 

1.100 

10 

3.169 

2.764 

2.228 

1.812 

1 

372 

1.093 

11 

3.106 

2.718 

2.201 

1 796 

1 

363 

1.088 

12 

3.055 

2.681 

2.179 

1 782 

1 

356 

1.083 

13 

3.012 

2 650 

2 160 

1 771 

1 

350 

1 079 

14 

2.977 

2 624 

2 145 

1 761 

1 

345 

1 076 

15 

2.947 

2 602 

2.131 

1.753 

1 

341 

1.074 

16 

2.921 

2 583 

2 120 

1 746 

1 

337 

1 071 

17 

2 898 

2 567 

2.110 

1.740 

1 

333 

1 069 

18 

! 2.878 

2 552 

2.101 

1 734 

1 

330 

1 067 

19 

2 861 

2 539 

2.093 

1 729 

1 

328 

1 066 

20 

2.845 

2.528 

2.086 

1 725 

1 

325 

1.064 

21 

2.831 

2 518 

2 080 

1.721 

1 , 

.323 

1 063 

22 

2 819 

2.508 

2 074 

1.717 

1 

321 

1 061 

23 

2 807 

2 500 

2 069 

1 714 

1 

319 

1 060 

24 

2.797 

2 492 

2 064 

1 711 

1 

318 

1 059 

25 

2.787 

2 485 

2 060 

1.708 

1 . 

316 

1.058 

26 

2.779 

2 479 

2 056 

1 706 

1 . 

315 

1 058 

27 

2.771 

2 473 

2 052 

1.703 

1 

314 

1 057 

28 

2.763 

2.467 

2 048 

1.701 

1 

313 

1.056 

29 

2.756 

2.462 

2 045 

1.699 

1 . 

311 

1 055 

30 

2.750 

2.457 

2.042 

1.697 

1 . 

310 

1.055 

00 

2.576 

2 326 

1.960 

1 645 

1 

282 

1 036 


The probability of a deviation numerically greater than t is twice the 
probability given at the head of the table. 

♦ This table is reproduced from Statistical Methods for Research Workers, with the generous 
permission of the author, Professor R. A. Fisher, and the publishers, Messrs. Oliver and Boyd. 
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Degrees 

of 

Probability of a deviation greater than t 

freedom n 

.2 

.25 

.3 

35 , 

.4 

45 

1 

1 376 

1.000 

.727 

.510 

325 

.158 

2 

1.061 

816 

.617 

.445 

.289 | 

.142 

3 

.978 

.765 

.584 

.424 

.277 

.137 

4 

.941 

.741 

.569 

.414 

.271 j 

.134 

6 

.920 

.727 

.559 

.408 

.267 

.132 

6 

.906 

.718 

.553 

.404 

.265 

.131 

7 

.896 

.711 

.549 

.402 

.263 

.130 

8 

.889 

.706 

.546 

.399 

.262 

.130 

9 

.883 

.703 

.543 

.398 

.261 

.129 

10 

.879 

.700 

.542 

! 

.397 

.260 

.129 

11 

.876 

.697 


.396 


.129 

12 

.873 

.695 

.539 

395 

.259 

.128 

13 

.870 

.694 

538 

394 

.259 

.128 

14 

.868 

.692 

.537 

.393 

.258 

.128 

15 

.866 

.691 

.536 

.393 

.258 

,128 

16 

.865 

.690 

.535 

.392 

.258 

.128 

17 

.863 

.689 

.534 

.392 

.257 

128 

18 

.862 

.688 

.534 

.392 

257 

.127 

19 

.861 

.688 

.533 

.391 

.257 

.127 

20 

.860 

i .687 

i 

.533 

.391 

.257 

.127 

21 

.859 

.686 

532 

.391 

.257 

.127 

22 

.858 

.686 

532 

.390 

.256 

.127 

23 

.858 

.685 

.532 

.390 

256 

.127 

24 

.857 

{ .685 

531 

.390 

.256 

.127 

25 

.856 

.684 

.531 

.390 

.256 

.127 

26 

.856 

.684 

.531 

.390 

.256 

.127 

27 

.855 

.684 

.531 

.389 

.256 

.127 

28 

.855 

t .683 

530 

.389 

.256 

.127 

29 

.854 

.683 

.530 

.389 

.256 

.127 

30 

.854 

.683 

.530 

.389 

.256 

.127 

00 

.842 

.674 

.524 

.385 

.253 

.126 


The probability of a deviation numerically greater than t is twice the 
probability given at the head of the table. 
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* Reprinted, by permission, from Snedecor, Statistical Methods , Collegiate Press, Iowa State College, Ames. 
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INDEX 


Analysis of variance, 158 
efficiency properties, 217 
Array distribution, continuous variable, 
101 

discrete variable, 99 
normal variable, 104 
Average outgoing quality limit, 223 

Binomial distribution, 38 

moment-generating function, 41 
normal curve approximation, 45 
Poisson approximation, 51 
properties, 41 

Binomial index of dispersion, 197 

Central tendency, 8 
Change of variable, 132 
Chi-square distribution, 134 
additive property, 138 
applied to contingency tables, 191 
applied to curve fitting, 193 
applied to goodness of fit, 186 
applied to indices of dispersion, 195 
applied to variances, 138, 212 
moment-generating function, 135 
sketch, 134 
Chi-square test, 186 
generality of, 189 
limitations on, 191, 199 
Class mark, 5 
Classification of data, 4 
Coefficient, correlation, 83 
regression, 147 
Conditional distribution, 102 
Conditional probability, 99 
Confidence limits, 130 
for means, 144 

for regression coefficients, 147 
for variances, 138 
Consumer’s risk, 222 
Contingency tables, 191 
Control chart, for means, 70 
for percentages, 49 


Convergence, stochastic, 174 
Correlation, curvilinear, 92 
linear, 81 
serial, 182 

Correlation coefficient, 83 
cause and effect, 88 
computation, 85 
multiple, 115 
partial, 116 

quantitative interpretation, 83 
reliability, 88 
serial, 182 
theoretical, 103 
Correlation index, 92 
Correlation ratio, 93 
Covariance, 102 
Critical deviation, 46 
Critical region, 202 
Curve fitting, 78, 89 
Curve of regression, 102 
Curvilinear correlation, 92 
Curvilinear regression, functional, 90 
polynomial, 89 

Defective, percentage, 48, 221 
Deviation, critical, 46 
Difference of two means, confidence 
limits, 145 
distribution, 71 

moment-generating function, 71 
testing, 71, 145 

Difference of two percentages, testing, 
72 

Discrete frequency distributions, 35 
binomial, 38 
multinomial, 53 
Poisson, 50 

Discriminant function, 121 
Dispersion, index of, 197 
see also Variation 
Distribution, of chi-square, 188 
of correlation coefficient, 89, 183 
of means, 65, 69, 71 


255 
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INDEX 


Distribution, of percentages, 46, 72 
of proportion of population in sample 
range, 175 
of range, 164 
of runs, 180 
of successes, 38, 45 
of variances, 138, 152, 212 
Distribution function, 22 
Bernoulli, 38 
binomial, 38 
chi-square, 134, 188 
F, 150 
general, 22 
multinomial, 54 
normal, 28, 103 
Poisson, 50 
Student’s £, 140 
t, 140 

Efficiency of tests, 203 
in experiments, 217 
Error, standard, 66 
two types of, 202 

Error variance, in analysis of variance, 
161 

in linear regression, 84 
Estimate, maximum likelihood, 207 
unbiased, 129 
Expected value, 128 
Experimental error, 159 

F distribution, derivation, 150 

for testing equality of two variances, 
152 

for testing homogeneity of means, 
154 

for testing homogeneity of rows and 
columns, 158 
sketch, 153 
use of tables for, 153 
Fraction defective, lot tolerance, 221 
process average, 222 
Frequency distribution, 1 
see also Distribution 
Function, continuous distribution, 23 
discrete distribution, 35 
gamma, 25 
likelihood, 207 
linear discriminant, 121 
moment-generating, 26 


Gamma function, 25 
Geometric mean, 18 
Goodness of fit, degrees of freedom in, 
193 

for binomial distribution, 195 
for normal distribution, 194 
for Poisson distribution, 195 
testing by chi square, 186 

Histogram, 6 

Homogeneity,of means, test for, 154,158 
of variances, test for, 152, 210 
Homoscedasticity, 106 
Hypotheses, composite, 209 
simple, 208 
testing, 56, 202 

Independent variables, 62 
normal, 104 
sum of, 63 

Index, correlation, 92 

of dispersion, binomial, 197 
Poisson, 197 

Inequality, Tchebycheff’s, 172 
Inspection, sampling, 221 
minimum, 223 

Kurtosis, 16 

Law of large numbers, 174 
Least squares, 79 

for curvilinear regression, 90 
for linear regression, 79 
for multiple regression, 111 
Likelihood, function, 207 
ratio, 208, 210, 229 
Linear discriminant function, 121 
Linear regression, for two variables, 78 
for several variables, 110 
normal equations of, 90, 111 
Lot tolerance fraction defective, 221 

Marginal distribution, continuous, 101 
discrete, 98 
normal, 103 

Maximum likelihood, estimation by, 207 
principle of, 206 
Mean, 8 

computation, 9 
confidence limits, 144 
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Mean, control chart, 70 
difference of two means, 71, 145 
distribution, 66, 69 
homogeneity of several means, 154 
Mean deviation, 19 
Median, 18 
Mode, 18 

Moment-generating function, 26, 36 
of binomial distribution, 41 
of chi-square distribution, 135 
of normal distribution, 30 
of Poisson distribution, 50 
of sum of independent variables, 63 
properties of, 28 

relation to distribution function, 45, 
57 

Moments, computation of, 17 
of a binomial distribution, 41 
of a continuous variable, 24 
of a discrete variable, 35 
of a normal distribution, 30 
of frequency distributions, 8, 11 
Multinomial distribution, 53 
Multiple correlation coefficient, 115 
Multiple linear regression, 110 

Non-parametric methods, 171 
Normal distribution, of one variable, 28 
fitting to histogram, 32 
moment-generating function of, 30 
moments of, 30 
properties of, 31 
standard, 33 
of two variables, 103 
array distribution of, 104 
geometrical representation of, 107 
independence in, 104 
marginal distribution of, 103 
properties of, 104 

Normal equations of least squares, 90 
for linear regression, 90 
for multiple regression, 111 

Partial correlation coefficient, 116 
Peakedness, measure of, 15 
Percentage, defective, 48, 221 
difference of two percentages, 72 
distribution of, 46 
Plane of regression, 113 
Poisson distribution, 50 
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Poisson distribution, approximation to 
binomial, 51 
fitting to data, 52 
moment-generating function of, 50 
Poisson index of dispersion, 197 
Polynomial regression, 89 
Population, 1 
Probable error, 67 
Probability, 2, 23 
basic rules of, 36 
conditional, 99 
density, 99 
discrete, 98 

Process average fraction defective, 222 
Producer’s risk, 222 
Product moments, 102 
Proportion, of population in sample 
range, 175 
see also Percentage 
Public-opinion polls, 227 

Quality control chart, for means, 70 
for percentage defective, 49 

Random sampling, 22, 63 
Randomization, principle of, 216 
Randomness of sequences, 177 
testing by runs, 177 
testing by serial correlation, 182 
Range, 19 
distribution of, 164 
relation to standard deviation, 165 
Regression, curvilinear, 89 
functional linear, 90 
linear, 78 
polynomial, 89 

Regression coefficient, confidence limits 
for, 147 

R egression curve, 102 
Regression line, 80 
Regression plane, 113 
Replication, principle of, 216 
Representative sampling, 227 
Runs, 177 
distribution of, 180 
tables for, 181 

Sample, 1 

Sampling, random, 22, 63 
representative, 227 
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Sampling, stratified, 226 
Sampling inspection, 221 
Scatter diagram, 78 
Sequential analysis, 228 

for binomial distributions, 230 
probability ratio test of, 229 
Serial correlation, 182 
Significance level, 56 
Skewness, 7, 15 
Standard deviation, 11 
computation of, 12 
relation to range, 165 
Standard error, 66 
of estimate, 114 
Standard unit, 32 
Statistical hypotheses, 56, 201 
tests of, 56, 202 
Stochastic convergence, 174 
Stratified sampling, 226 
Student’s t distribution, 140 
applied to means, 144, 145 
applied to regression coefficients 147 
derivation, 141 
sketch, 144 

Sum of squares, distribution of 135 

t distribution, 140 

see also Student’s t distribution 
Tables, for range to standard deviation, 
165 


Tables, for runs, 181 

For all other variables , see pp. £48-261 
Tchebycheff’s inequality, 172 
Tolerance limits, 174 
Two types of error, 202 

for determining size of experiments, 
219 

for increasing efficiency of tests, 218 
for testing hypotheses, 203 
in sampling inspection, 221 

Unbiased estimate, 129 
of the variance, 128, 330 

x 2 , see Chi-square distribution 

Validity, 215 
Variable, change of, 132 
continuous, 22 
discrete, 35 
Variance, 12 

computation of, 3 2 
confidence limits for, 138 
distribution of, 138 
testing homogeneity of two variances, 
152 

testing homogeneity of several vari¬ 
ances, 210 

unbiased estimate of, 128, 130 
Variation, measures of, 11 
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